BEST AVAILABLE COPY 



REMARKS/ARGUMENTS 

Claims 58-63, 69 and 70 are pending in this application. 

Claims 58-62 have been amended to remove the recitation of the phrase "native 
sequence." The amendments to the claims are fully supported by the specification and claims as 
originally filed and do not constitute new matter. Applicants believe that the current 
amendments place all claims in prima facie condition for allowance or, at least, in a better form 
for consideration on appeal. Accordingly, the consideration and entry of the present amendment 
after final rejection is respectfully requested. 

Applicants expressly reserve the right to pursue any canceled matter in subsequent 
continuation, divisional or continuation-in-part application(s). 

I. Priority 

The Examiner asserts that Applicants are entitled to the priority of the filing date of the 
present application, October 15, 2001 allegedly "because the claimed invention is not supported 
by either a specific and substantial utility or a well established utility for the claimed 
polypeptides." (Page 3 of the instant Office Action). 

As previously stated in Applicants' Responses filed on October 4, 2004, and May 23, 
2005, Applicants rely on the gene amplification assay for patentable utility. The results of the 
gene amplification assay in lung tumors were first disclosed in U.S. Provisional Patent 
Application Serial No. 60/100,038, filed on September 11, 1998 and the results of the gene 
amplification assay in lung and colon tumors were disclosed in U.S. Provisional Patent 
Application Serial No. 60/131,445, filed April 28, 1999, priority to which have been claimed in 
this application. Accordingly, Applicants submit that the subject matter of the instant claims is 
supported by the disclosure in U.S. Provisional Patent Application Serial No. 60/100,038, filed 
on September 1 1, 1998 and in U.S. Provisional Patent Application Serial No. 60/131,445, filed 
April 28, 1999. Therefore, the effective filing date of this application is April 28, 1999, the filing 
date of U.S. Provisional Patent Application Serial No. 60/131,445. 
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II. Claim Rejections Under 35 U.S.C. §112, Second Paraeraph 

Claims 58-62 and 69-70 are rejected under 35 U.S.C. §112, second paragraph, as 
allegedly being indefinite for the recitation of a "native sequence" polypeptide. The Examiner 
asserts that "it is not clear how one of ordinary skill in the art would be able to determine if a 
sequence is 'a native sequence 1 or not by looking at it." (Page 3 of the instant Office Action). 
Without acquiescing to the PTO ! s arguments and solely in order to expedite prosecution of the 
instant application, Claims 58-62 have been amended to remove the recitation of the phrase 
"native sequence." Accordingly, withdrawal of the rejection under 35 U.S.C. §112, second 
paragraph is respectfully requested. 

III, Claim Rejections Under 35 U.S.C. §§101 and 112, First Paragraph (Enablement) 

Claims 58-63 and 69-70 remain rejected under 35 U.S.C. §101 allegedly "because the 
claimed invention is not supported by either a specific and substantial asserted utility or a well 
established utility." (Page 4 of the instant Office Action). 

Claims 58-63 and 69-70 further remain rejected under 35 U.S.C. §112, first paragraph, 
allegedly "since the claimed invention is not supported by either a credible, specific and 
substantial asserted utility or a well established utility . . one skilled in the art clearly would not 
know how to use the claimed invention." (Page 4 of the instant Office Action). 

For the reasons outlined below, Applicants respectfully disagree and traverse the 
rejection. With respect to Claims 58-63 and 69-70, Applicants submit that not only has the 
Patent Office not established a prima facie case for lack of utility and enablement, but that the 
PR0213-1 polypeptides possess a credible, specific and substantial asserted utility and are fully 
enabled. 

First of all, Applicants respectfully maintain the position that the specification discloses 
at least one credible, substantial and specific asserted utility for the claimed PR0213-1 
polypeptides for the reasons previously set forth in Applicants 1 Responses filed on October 4, 
2004, and May 23, 2005. 

' Furthermore, as first discussed in Applicants 1 Response of October 4, 2004, Applicants 
rely on the gene amplification data for patentable utility of the PR0213-1 polypeptide, and the 
gene amplification data for the gene encoding the PR0213-1 polypeptide is clearly disclosed in 
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the instant specification under Example 1 14. As previously discussed, a ACt value of at least 1.0 
was observed for PR0213-1 in at least 35 of the lung and colon primary tumors and tumor cell 
lines listed in Table 9. Table 9 teaches that the nucleic acids encoding PR0213-1 showed 1.03 
to 5.55 ACt units which corresponds to 2 1 03 to 2 5 55 - fold amplification or 2.04 to 46.9 fold 
amplification in 16 different human primary lung tumors, LT1 , LTla, LT3, LT4, LT6, LT7, LT9, 
LT1 1, LT12, LT13, LT15, LT16, LT17, LT19, LT21 and LT22. PR0213-1 also showed 1.18 to 
3.79 ACt units which corresponds to 2 1 18 to 2 3 79 - fold amplification or 2.27 to 13.8 fold 
amplification in U different human primary colon tumors, CT2, CT4, CT5, CT6, CT8, CT10, 
CT12, CT14, CT15, CT16 and CT17. In addition, PR0213-1 showed 1.31 to 2.95 ACt units 
which corresponds to 2 1 31 to 2 2 ' 95 - fold amplification or 2.48 to 7.73 fold amplification in three 
different lung cancer cell lines (Calu-1, H441 and H810), and 1.22 to 2.08 ACt units which 
corresponds to 2 1 22 to 2 2 08 - fold amplification or 2.33 to 4.23 fold amplification in five different 
colon cancer cell lines (HT29, SW403, LS174T, HCT15 and HCC2998). Accordingly, the 
present specification clearly discloses overwhelming evidence that the gene encoding the 
PR0213-1 polypeptide is significantly amplified in a significant number of lung and colon 
tumors. 

In further support, Applicants have submitted, in their Response filed October 4, 2004, a 

Declaration by Dr. Audrey Goddard. Applicants particularly draw the Examiner's attention to 

page 3 of the Goddard Declaration which clearly states that: 

It is further my considered scientific opinion that an at least 2-fold increase in 
gene copy number in a tumor tissue sample relative to a normal (i.e., non- 
tumor) sample is significant and useful in that the detected increase in gene 
copy number in the tumor sample relative to the normal sample serves as a basis 
for using relative gene copy number as quantitated by the TaqMan PCR 
technique as a diagnostic marker for the presence or absence of tumor in a tissue 
sample of unknown pathology. Accordingly, a gene identified as being 
amplified at least 2-fold by the quantitative TaqMan PCR assay in a tumor 
sample relative to a normal sample is useful as a marker for the diagnosis of 
cancer, for monitoring cancer development and/or for measuring the efficacy of 
cancer therapy. (Emphasis added). 

As indicated above, the gene encoding the PR0213-1 polypeptide shows at least a two 

fold amplification in 35 different lung and colon tumors and tumor cell lines. In addition, the 

Goddard Declaration clearly establishes that the TaqMan real-time PCR method described in 
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Example 1 14 has gained wide recognition for its versatility, sensitivity and accuracy, and is in 
extensive use for the study of gene amplification. The facts disclosed in the Declaration also 
confirm that based upon the gene amplification results, one of ordinary skill would find it 
credible that PR0213-1 is a diagnostic marker of lung and colon cancer. 

The Examiner asserts that "damaged, precancerous lung epithelium is often aneuploid," 
and states that M [o]ne skilled in the art would not conclude that PR0213-1 is a diagnostic probe 
for lung cancer unless it is clear that PR0213-1 is amplified to a clearly greater extent in true 
lung or colon tumor tissue relative to non-cancerous lung or colon epithelium." (Page 5 of the 
instant Office Action). In support of this assertion the Examiner refers to the reference by 
Hittelman. 

Applicants note that the title of the Hittelman paper is "Genetic Instabilities in Epithelial 
Tissues at Risk for Cancer." Hittelman studied lung tissue from chronic smokers, which had 
been exposed for years to carcinogenic tobacco smoke. As Hittelman explains, "[t]umors of the 
aerodigestive tract have been proposed to reflect a 'field cancerization 1 process whereby the 
whole tissue is exposed to carcinogenic insult (e.g., tobacco smoke) and is at increased risk for 
multistep tumor development (page 3). The detection of increases in chromosome number 
therefore identifies cells which have begun the first steps in this multistep progression to cancer. 
Even if these particular epithelial regions are not yet cancerous, their presence is strongly 
correlated with the development of cancer in the target tissue as a whole. Hittelman concludes 
that "the measurement of chromosome instability in the target tissue will be useful in 
assessing cancer risk as well as response to intervention" (page 10; emphasis added). 

Accordingly, Hittelman shows that an increase in chromosome number or gene 
amplification is associated not with normal tissues, but with cancerous, or pre-cancerous tissues , 
and therefore, an increase in chromosome number or gene amplification is a useful marker for a 
cancerous or pre-cancerous state. Detection of pre-cancerous cells or tissues is useful because, 
as explained by Hittelman, it allows for assessing cancer risk, as well as response to intervention. 
Hence, Applicants respectfully submit that whether a pre-cancerous or tumor sample were 
analyzed, the showing of DNA amplification of the PR0213-1 gene would still be significant, 
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since it would lead to the diagnosis of either a pre-cancerous state or a cancerous state, which is 
the utility asserted here. 

Despite the Examiner's assertion that such a use "is not well-established in the prior art," 
it is clear, as discussed above, that the use of amplified genes as markers for assessing cancer 
risk is explicitly contemplated in Hittelman et al. Further, the attached paper by Crowell et al 
(Cancer Epidemiol. Biomarkers Prev. 5:631-637 (1996); copy enclosed as Exhibit 1) studies the 
detection of trisomy 7 in nonmalignant bronchial epithelium from lung patients and from 
individuals at high risk for lung cancer. The authors concluded that "molecular analyses may 
enhance the power for detecting premalignant changes in bronchial epithelium in high-risk 
individuals" (Abstract). Thus the use of amplified genes as markers for assessing cancer risk 
was explicitly contemplated in the art as early as 1996 . 

The Examiner asserts that the data shown in Table 9 does not provide a basis for utility or 
enablement of the claimed polypeptides, because "it is not predictable that gene amplification 
results in increased mRNA expression, or that increased mRNA expression results in increased 
protein production" (Page 6 of the instant Office Action). In support of this assertion, the 
Examiner has previously cited references by Pennica et al. and Gygi et al The Examiner asserts 
that Pennica et al. was cited as "evidence showing a lack of correlation between gene (DNA) 
amplification and elevated mRNA levels." (Page 7 of the instant Office Action). Applicants 
respectfully submit that, for the reasons previously set forth in Applicants' Responses filed on 
October 4, 2004, and May 23, 2005, the teachings of Pennica et al are specific to WISP genes, 
and say nothing about the correlation of gene amplification and protein expression in general . 
The Examiner asserts that Gygi et al was cited "as providing evidence that polypeptide levels 
cannot be accurately predicted from mRNA levels, and that variances as much as 40-fold or 50- 
fold were not uncommon." (Page 7 of the instant Office Action). Yet the Examiner 
acknowledges that "Gygi et al. demonstrates that high levels of mRNA generally correlate with 
high levels of protein and that it appears that there is a general positive correlation between 
mRNA levels and protein levels." (Page 8 of the instant Office Action). Thus Gygi et al 
supports Applicants 1 position that there is a positive correlation between the overexpression of 
mRNA and protein. 
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In support of the assertion that there is "a poor correlation between mRNA expression 
and protein abundance," the Examiner cites additional references by Lian et al and Fessler et al 
(Page 8 of the instant Office Action). The Examiner asserts that Lian et al examined mRNA 
versus protein levels in differentiating myeloid cells and found that there was a poor correlation 
between mRNA expression and protein changes. Applicants submit that Lian et al only teach 
that protein expression may not correlate with mRNA level in differentiating myeloid cells and 
does not teach anything regarding such a lack of correlation for genes in general . Myeloid cell 
differentiation relates to hematopoiesis and is an entirely different biological process from solid 
tumor development because these two process involve entirely different regulatory mechanisms 
and molecules. Analysis of surface antigens expressed on myeloid cells of the granulocyte- 
monocyte-histiocyte series during differentiation in normal and malignant myelomonocytic cells 
is useful in identifying and classifying human leukemias and lymphomas, but cannot be used in 
diagnosis of any solid tumors. Therefore, even if the teaching of Lian et al accurately reflects 
the correlation between mRNA and protein for the particular system studied, it can not apply to 
tumor diagnosis assays of the present application. 

In addition, the authors themselves admit that there are a number of problems with the 
data presented in this reference. At page 520 of this article, the authors explicitly express their 
concerns by stating that " [Ylhese data must be considered with several caveats: membrane and 
other hydrophobic proteins and very basic proteins are not well displayed by the standard 2DE 
approach, and proteins presented at low level will be missed. In addition, to simplify MS 
analysis, we used a Coomassie dye stain rather than silver to visualize proteins, and this 
decreased the sensitivity of detection of minor proteins. " (emphasis added). It is known in the art 
that Coomassie dye stain is a very insensitive method of measuring protein. This suggests that 
the authors relied on a very insensitive measurement of the proteins studied. The conclusions 
based on such measurements can hardly be accurate or generally applicable. 

The Examiner also asserts that Fessler et al, who examined lipopoysaccharide-activated 
neutrophilins, "found a 'poor concordance between mRNA transcript and protein expression 
changes' in human cells." (Page 6 of the instant Office Action). Again, as with Lian et al, 
Fessler et al only examined the expression level of a few proteins/RNAs in response to LPS 
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stimulation, which involves an entirely different regulatory mechanism from that involved in 
tumor development. Therefore, the teachings of Fessler et al do not apply here. Additionally, 
the PTO has overlooked a number of limitations of the study by Fessler et al 

For example, as admitted by Fessler et al, protein identification by two-dimensional 
PAGE is limited to well-resolved regions of the gel, may perform less well with hydrophobic and 
high molecular weight proteins, and tends to select for more abundant protein species (page 
31301, col. 1). Harvesting of the LPS-incubated PMNs at 4 hours may have prevented detection 
of earlier, transient changes and may have thereby introduced artificial transcript-protein 
discordance. Furthermore, the post-LPS incubation, pre-two-dimensional PAGE cell washes 
would be expected to remove secreted proteins from further analysis. In addition, because 
protein binding of Coomassie Blue has a limited dynamic range and is typically not linear 
throughout the range of detection, image analysis of Coomassie Blue-stained protein spots 
should only be consider as semi-quantitative (see page 31301, col. 1). 

In summary, both Fessler et al and Lian et al have relied on insensitive and inaccurate 
methods of measuring protein expression levels. The teachings of these two references can not 
be relied upon to establish a prima facie showing of lack of utility. 

The Examiner suggests that a "very relevant reference" is Chen et al (Page 8 of the 
instant Office Action). The Examiner cites Chen et al to the effect that only twenty-eight of the 
165 protein spots (17%) or 21 of 98 genes (21.4%) had a statistically significant correlation 
between protein and mRNA expression data. Applicants respectfully submit that the analysis by 
Chen et al is not applicable to the present application. 

First, Applicants note that proteins selected for study by Chen et al were those detectable 
by staining of 2D gels. As noted in, for example, Haynes et al (Electrophoresis 19:1862-1871 
(1998); copy enclosed as Exhibit 2) there are problems with selecting proteins detectable by 2D 
gels. "It is apparent that without prior enrichment only a relatively small and highly selected 
population of long-lived, highly expressed proteins is observed. There are many more proteins 
in a given cell which are not visualized by such methods. Frequently it is the low abundance 
proteins that execute key regulatory functions" (page 1870, col. 1). Thus Chen et al by selecting 
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proteins detectable by staining of 2D gels are likely to have excluded from their analysis many of 
the proteins most likely to be significant as cancer markers. 

Secondly, Chen et al looked at expression levels across a set of samples including a large 
number of tumor samples (76) along with a much smaller number of normal samples (9). The 
tumor samples were taken from stage 1 and stage III lung adenocarcinomas, which were 
classified as bronchoaveolar, bronchial derived or both bronchial and bronchoaveolar derived. 
Accordingly, the tissues examined were from different tissues in different stages of normal or 
cancerous growth. The authors determined the relationship between mRNA and protein 
expression by using the average expression values for all samples . The average value for each 
protein or mRNA was generated using all 85 lung tissue samples. This resulted in negative 
normalized protein values in some cases. Further, the authors chose an arbitrary threshold of 
0.1 15 for the correlation to be considered significant. Accordingly, the Chen paper does not 
account for different expression in different tissues or different stages of cancer. 

Thirdly, no attempt was made to compare expression levels in normal versus tumor 
samples, and in fact the authors concede that they had too few normal samples for meaningful 
analysis (page 310, col. 2). As a result, the analysis in the Chen paper shows only that a number 
of randomly selected proteins have varying degrees of correlation between mRNA and protein 
expression levels within a set of different lung adenocarcinoma samples. The Chen paper does 
not address the issue of whether increased mRNA levels in the tumor samples taken together as 
one group, as compared to the normal samples as a group, correlated with increased protein 
levels in tumorous versus normal tissue. Accordingly, the results presented in the Chen paper 
are not applicable to the application at issue. 

The correct test of utility is whether the utility is "more likely than not". In the case of 
the Chen reference, even if the analysis presented is correct (which is disputed), a review of the 
correlation coefficient data presented in the Chen et al. paper indicates that it is more likely than 
not that increased mRNA expression correlates with increased protein expression. A review of 
Table 1, which lists 66 genes [the paper incorrectly states there are 69 genes listed] for which 
only one protein isoform is expressed, shows that 40 genes out of 66 had a positive correlation 
between mRNA expression and protein expression. This clearly meets the test of "more likely 
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than not". Similarly, in Table II , 30 genes with multiple isoforms [again the paper incorrectly 
states there are 29] were presented. In this case, for 22 genes out of 30, at least one isoform 
showed a positive correlation between mRNA expression and protein expression. Furthermore, 
12 genes out of 29 showed a strong positive correlation [as determined by the authors] for at 
least one isoform. No genes showed a significant negative correlation. It is not surprising that 
not all isoforms are positively correlated with mRNA expression. Certain isoforms are likely 
non-functional proteins. Thus, Table II also provides that it is more likely than not that protein 
levels will correlate with mRNA expression levels. 

The same authors in Chen et a/., published a later paper, Beer et aL, Nature Medicine 
8(8) 816-824 (2002) (copy enclosed as Exhibit 3) which described gene expression of genes in 
adenocarcinomas and compared that to protein expression. In this paper they report that " these 
results suggest that the oligonucleotide microarrays provided reliable measures of gene 
expression" (page 817). The authors also state "these studies indicate that many of the genes 
identified using gene expression profiles are likely relevant to lung adenocarcinoma". Clearly 
the authors of the Chen paper agree that microarrays provide a reliable measure of gene 
expression levels and can be used to identify genes whose overexpression is associated with 
tumors. 

Similarly, the references previously submitted by Applicants (the Orntoft, Hyman, and 
Pollack references), also analyzed mRNA and protein expression levels for genes known to be 
amplified in tumor samples. These papers also indicate that it is more likely than not that 
increased gene expression levels correlate with increased expression of the protein. The Chen 
reference does not provide sufficient evidence to dispute this finding. 

The Examiner further cites Anderson et al to the effect that there was a poor correlation 
(0.48) between mRNA and protein levels in liver cells. Applicants submit that the teachings of 
Anderson et al do not apply to the presently claimed invention because Anderson et al. studied 
mRNA/protein correlation in proteins obtained from liver tissue , while the present invention is 
directed to polypeptides that are overexpressed in colon and lung tumor , which is an entirely 
different cellular environment from liver tissue. It would be apparent that different post- 
translational or post-transcriptional regulation mechanisms are involved in these two systems. 
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Therefore, the conclusion of Anderson et al does not apply to proteins associated with tumor 
tissues. Moreover, even the author in this reference admitted that several experimental flaws in 
this paper will limit that accuracy of the data. For instant, the protein measurements rely on CBB 
binding and its is well-known that different proteins can bind CBB with different affinities. 
More significantly, the authors did not measure actual mRNA abundance for each protein, but 
looked at the numbers of clones found in a library. The precision of these measurements is 
limited because several proteins studied were represented only by one or two clones. As the 
authors admit, "such small numbers of clones lead to potentially large quantitative errors because 
of sampling error" (page 536, col. 1). As can be seen in Table 1, the data from the proteins 
represented only by one or two clones strongly affects the non-linearity of the total dataset. Thus 
these technique limitations are detrimental to the accuracy of the protein and mRNA abundance 
data as well as the conclusions based on these data. Finally, even assuming it is accurate, the 
conclusion by Anderson et al does not support the Examiner's position. To the contrary, the 
data in Anderson et al suggest that there is a significant correlation between mRNA and protein 
levels. Anderson et al have observed a correlation coefficient of 0.48 between protein and 
mRNA abundance. As shown, for example, in Chen et al, correlation coefficients over 0.25 are 
deemed to be significant (see Table II, and page 309, col. 1). In fact, the highest correlation 
coefficient reported by Chen et al is 0.4003, less than the 0.48 observed for the Anderson et al 
data. Accordingly, the Examiner cannot rely on the teaching of Anderson et al to establish a 
prima facie showing of lack of utility. 

Applicants reiterate that the evidentiary standard to be used throughout ex parte 
examination in setting forth a rejection is a preponderance of the totality of the evidence under 
consideration. Thus, to overcome the presumption of truth that an assertion of utility by the 
applicant enjoys, the Examiner must establish that it is more likely than not that one of ordinary 
skill in the art would doubt the truth of the statement of utility. Only after the Examiner has 
made a proper prima facie showing of lack of utility, does the burden of rebuttal shift to the 
applicant. 

The Patent Office has failed to meet its initial burden of proof that Applicant's claims of 
utility are not substantial or credible. The arguments presented by the Examiner in combination 
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with the Lian et al. 9 Fessler et al, Chen et al and Anderson et al papers do not provide 
sufficient reasons to doubt the statements by Applicants that PR0213-1 has utility. As set forth 
above, both Chen et al and Anderson et al support Applicants' position that there is a positive 
correlation between the overexpression of mRNA and protein. 

In contrast, Applicants have submitted ample evidence to show that, in general, if a gene 
is amplified in cancer, it is more likely than not that the encoded protein will be expressed at an 
elevated level. First, the articles by Orntoft et al, Hyman et al, and Pollack et al, (made of 
record in Appellants' Response filed October 4, 2004) collectively teach that in general gene 
amplification increases mRNA expression . Second, the Declaration of Dr. Paul Polakis, 
principal investigator of the Tumor Antigen Project of Genentech, Inc., the assignee of the 
present application, shows that, in general there is a correlation between mRNA levels and 
polypeptide levels . 

The Examiner asserts that "Orntoft et al. could only compare the levels of about 40 well- 
resolved and focused abundant proteins." (Page 10 of the instant Office Action; emphasis in 
original). While technical considerations did prevent Orntoft et al from evaluating a larger 
number of proteins, the ones they did look at showed a clear correlation between mRNA and 
protein expression levels. As Orntoft et al state, "In general there was a highly significant 
correlation (p<0.005) between mRNA and protein alterations.. . . 26 well focused proteins whose 
genes had a known chromosomal location were detected in TCCs 733 and 335, and of these 19 
correlated (p<0.005) with the mRNA changes detected using the arrays." (See page 42, column 
2 to page 34, column 2). Accordingly, Orntoft et al clearly support Applicants* position that 
proteins expressed by genes that are amplified in tumors are useful as cancer markers. 

The Examiner also appears to misunderstand the data presented by Hyman et al The 
Examiner has asserted that "of the 12,000 transcripts analyzed, a set of 270 was identified in 
which overexpression was attributable to gene amplification." The Examiner concludes that 
"[t]his proportion is approximately 2%; the Examiner maintains that 2% does not provide a 
reasonable expectation that the slight amplification of PR0351 would be correlated with elevated 
levels of mRNA, much less protein." (Page 10 of the instant Office Action). Applicants 
respectfully submit that the Examiner appears to have misinterpreted the results of Hyman et al 
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Hyman et al chose to do a genome-wide analysis of a large number of genes, most of which , as 
shown in Figure 2, were not amplified . Accordingly, the 2% number is meaningless, as the low 
figure mainly results from the fact that only a small percentage of genes are amplified in the first 
place. The significant figure is not the percentage of genes in the genome that show 
amplification, but the percentage of amplified genes that demonstrate increased mRNA and 
protein expression. 

The Examiner has further asserted that the Hyman reference "found 44% of highly 
amplified genes showing overexpression at the mRNA level, and 10.5% of highly overexpressed 
genes being amplified; thus, even at the level of high amplification and high overexpression, the 
two do not correlate." (Page 10 of the instant Office Action). Applicants submit that the 10.5% 
figure is not relevant to the issue at hand. One of skill in the art would understand that there can 
be more than one cause of overexpression. The issue is not whether overexpression is always, or 
even typically caused by gene amplification, but rather, whether gene amplification typically 
leads to overexpression. 

The Examiner's assertion is not consistent with the interpretation Hyman et al 
themselves place on their data, stating that, "The results illustrate a considerable influence of 
copy number on gene expression patterns." (page 6242. col. 1; emphasis added). In the more 
detailed discussion of their results, Hyman et al teach that "[u]p to 44% of the highly amplified 
transcripts (CGH ratio, >2.5) were overexpressed (Le. 9 belonged to the global upper 7% of 
expression ratios) compared with only 6% for genes with normal copy number." (See page 
6242, col 1; emphasis added). These details make it clear that Hyman et al set a highly 
restrictive standard for considering a gene to be overexpressed; yet almost half of all highly 
amplified transcripts met even this highly restrictive standard . Therefore, the analysis performed 
by Hyman et al clearly shows that "it is more likely than not" that a gene which is amplified in 
tumor cells will have increased gene expression. 

The Examiner further asserts that Hyman et al and Pollack et al do not examine protein 
expression. Applicants respectfully submit that the Orntoft et al, Hyman et al and Pollack et al 
references were submitted primarily as evidence that in general, gene amplification increases 
mRNA expression . With regard to the correlation between mRNA expression and protein levels, 
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Applicants previously submitted a Declaration by Dr. Polakis, principal investigator of the 
Tumor Antigen Project of Genentech, Inc., the assignee of the present application, to show that 
mRNA expression correlates well with protein levels, in general . As previously discussed, the 
Utility Examination Guidelines 1 state, "Office personnel must accept an opinion from a qualified 
expert that is based upon relevant facts whose accuracy is not being questioned; it is improper to 
disregard the opinion solely because of a disagreement over the significance or meaning of the 
facts offered." 

The Examiner states that in assessing the weight to be given expert testimony, "the 
examiner may properly consider, among other things, the nature of the fact sought to be 
established ,the strength of any opposing evidence, the interest in the outcome of the case, and 
the presence or absence of factual support for the expert's opinion." (Page 1 1 of the instant 
Office Action). Applicants respectfully submit that, as discussed above, the PTO has failed to 
provide evidence demonstrating a lack of correlation between gene amplification and increased 
mRNA and protein levels, in general. Further, Dr. Polakis' statement that "an increased level of 
mRNA in a tumor cell relative to a normal cell typically correlates to a similar increase in 
abundance of the encoded protein in the tumor cell relative to the normal cell" is based on 
factual, experimental findings , clearly set forth in the Declaration. The Office Action's 
suggestion that Dr. Polakis might be misrepresenting these experimental results out of an interest 
in the outcome of the case is inappropriate. 

Taken together, although there are some examples in the scientific art that do not fit 
within the central dogma of molecular biology that there is a correlation between polypeptide 
and mRNA levels, these instances are exceptions rather than the rule. In the majority of 
amplified genes , the teachings in the art, as exemplified by Orntoft et al., Hyman et al, Pollack 
et al, and the Polakis Declaration, overwhelmingly show that gene amplification influences gene 
expression at the mRNA and protein levels. Therefore, one of skill in the art would reasonably 
expect in this instance, based on the amplification data for the PR0213-1 gene, that the PR0213- 
1 polypeptide is concomitantly overexpressed. Thus, Applicants submit that the PR0213-1 



1 Part IIB, 66 Fed. Reg. 1098 (2001). 
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polypeptides have utility in the diagnosis of cancer and based on such a utility, one of skill in the 
art would know exactly how to use the claimed polypeptides for diagnosis of cancer. 

The Examiner again cites Hu et al as showing that genes displaying a 5 -fold change or 
less in mRNA expression in tumors compared to normal showed no evidence of a correlation 
between altered gene expression and a known role in the disease. However, among genes with a 
10-fold or more change in expression level, there was a strong and significant correlation 
between expression level and a published role in the disease. (Page 12 of the instant Office 
Action). 

Applicants respectfully submit that Hu et al does not conclusively show that it is more 
likely than not that gene amplification does not result in increased expression at the mRNA and 
polypeptide levels. Applicants respectfully submit that Hu et al manipulated various aspects of 
the input data in order to minimize the false positives and negatives in their analysis Applicants 
further submit that the statistical analysis by Hu et al is not a reliable standard because the 
frequency of citation only reflects the current research interest in a molecule but not the true 
biological function of the molecule. Finally, the conclusion in Hu et al only applies to a specific 
type of breast tumor (estrogen receptor (ER)-positive breast tumor) and can not be generalized as 
a principle governing microarray study of breast cancer in general, let alone the various other 
types of cancer genes in general . In fact, even Hu et al admit that "[i]t is likely that this 
threshold will change depending on the disease as well as the experiment. Interestingly, the 
observed correlation was only found among ER-positive (breast) tumors not ER-negative 
tumors." (See page 412, left column). Therefore, based on these findings, the authors add, "This 
may reflect a bias in the literature to study the more prevalent type of tumor in the population. 
Furthermore, this emphasizes that caution must be taken when interpreting experiments that may 
contain subpopulations that behave very differently." (Id.; emphasis added). 

The Examiner asserts that "Applicant is holding Hu et al to a higher standard than their 
own specification, which does not provide proper statistical analysis such as reproducibility, 
standard error rates, etc." (Page 13 of the instant Office Action). Applicants note that they do 
not argue that Hu et al lacks reproducibility, standard error rates, etc. for their data, given that 
Hu et al did a literature survey and conducted no actual experiments of their own. Rather, 
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Applicants 1 point is that, given the various biases in selecting the data to be considered, as 
acknowledged by the authors themselves, the collection of data surveyed by Hu et al simply 
does not demonstrate the conclusion the PTO attempts to reach concerning a general lack of 
correlation between microarray data and biological significance. Accordingly, Applicants 
respectfully submit that the Examiner has not shown a lack of correlation between microarray 
data and the biological significance of cancer genes. 

The Examiner has asserted that Hanna et al. supports the rejection, in that Hanna et al. 
"show that gene amplification does not reliably correlate with protein over-expression, and thus 
the level of polypeptide expression must be tested empirically. " (Page 13 of the instant Office 
Action). Applicants respectfully point out that the Examiner appears to have misread Hanna et 
al Hanna et al clearly state that gene amplification (as measured by FISH) and polypeptide 
expression (as measured by immunohistochemistry, IHC) are well correlated ("in general, FISH 
and IHC results correlate well" (Hanna et al p. 1, col 2)). It is only a subset of tumors which 
show discordant results. Thus Hanna et al supports Applicants 1 position that it is more likely 
than not that gene amplification correlates with increased polypeptide expression. 

Applicants have clearly shown that the gene encoding the PR0213-1 polypeptide is 
amplified in at least 34 lung and colon tumors. Therefore, the PR0213-1 gene, similar to the 
HER-2/neu gene disclosed in Hanna et al, is a tumor associated gene. Furthermore, as discussed 
above, in the majority of amplified genes, the teachings in the art overwhelmingly show that 
gene amplification influences gene expression at the mRNA and protein levels. Therefore, one 
of skill in the art would reasonably expect in this instance, based on the amplification data for the 
PR0213-1 gene, that the PR0213-1 polypeptide is concomitantly overexpressed. 

However, even if gene amplification does not result in overexpression of the gene 
product {i.e., the protein) an analysis of the expression of the protein is useful in determining the 
course of treatment, as supported by the Ashkenazi Declaration and the Hanna article (submitted 
with Applicants' Response filed October 4, 2004). The Examiner appears to view the testing 
described in the Ashkenazi Declaration and the Hanna article as experiments involving further 
characterization of the PR0213-1 polypeptide itself. In fact, such testing is for the purpose of 
characterizing not the PR0213-1 polypeptide, but the tumors in which the gene encoding 
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PR0213-1 is amplified. The PR0213-1 polypeptides are therefore useful in tumor 
categorization, the results of which become an important tool in the hands of a physician 
enabling the selection of a treatment modality that holds the most promise for the successful 
treatment of a patient. 

Finally, the Examiner asserts that "even if it could be established that gene amplification 
is reflected by increased polypeptide levels, the claims are broadly drawn to polypeptides that 
can be variants of the polypeptide of SEQ ID NO:506. M The Examiner concludes that "such 
variant sequences would not reasonably be expected to show changed levels for a particular 
disease state." (Page 5 of the instant Office Action). 

Applicants respectfully point out that the claims recite variants of SEQ ID NO: 506 
wherein the nucleic acid encoding said polypeptide is amplified in colon or lung tumor . Those 
variants whose encoding nucleic acids are not amplified in lung tumors are not encompassed by 
the claims. It is understood that many polypeptides and especially tumor antigens are known to 
have different isoforms or variants 2 . One of skill in the art would therefore reasonably expect 
there to be variants of PR0213-1 that are also amplified in colon or lung tumors. The 
specification has provided detailed protocols for the gene amplification assay, in Example 114, 
such that one of ordinary skill in the art could identify those variants meeting the limitations of 
the claims, without any undue experimentation. 

In conclusion, Applicants submit that the present rejection is based on the application of 
an incorrect, elevated legal standard, on misconstruction of the references and erroneous 
conclusions drawn there from. The issue of patentable utility should be assessed on the totality 
of evidence, using the preponderance evidentiary standard. It is submitted that on the totality of 
evidence Applicants have clearly established that the claimed invention has a substantial, 
specific and credible utility, for example, in the diagnosis of cancer. Further, based on this utility 
and the disclosure in the specification, one skilled in the art at the time the application was filed 
would know how to use the claimed polypeptides. Accordingly, Applicants request the Examiner 



2 Peng et al., Cancer Research, 64:891 1-8918 (2004); Kiss et aL, Anticancer Research 24:3965-3970 
(2004); Perego et al, Molecular Carcinogenesis 42(4):229-239 (2005); Nagao et al, Genomics 85:462-471 (2005); 
Hong et al, Cancer Research 64:5504-5510 (2004) (copies enclosed as Exhibits 4-8). 
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to reconsider and withdraw the rejection of Claims 58-63, 69 and 70 under 35 U.S.C. §§101 and 
112. 

IV. Claim Rejections Under 35 U.S.C. §112, First Paragraph (Written Description) 

Claims 58-62, 69 and 70 remain rejected under 35 U.S.C. §112, first paragraph, as 
allegedly lacking adequate written description for the claimed variant polypeptides having at 
least 80-99% identity to amino acid residues 35-273 of SEQ ID NO:506, wherein the nucleic 
acid encoding the polypeptide is amplified in colon or lung tumors. 

Applicants respectfully submit that the instant specification evidences the actual 
reduction to practice of the PR0213-1 polypeptide comprising amino acid residues 35-273 of 
SEQ ID NO: 5 06. The Examiner has acknowledged that polypeptides comprising the sequence 
set forth in SEQ ID NO:506 meet the written description provision of 35 U.S.C. §112, first 
paragraph. (Page 13 of the Office Action mailed March 16, 2005). Thus, the genus of native 
sequence polypeptides with at least 80% sequence identity to amino acid residues 35-273 of SEQ 
ID NO: 5 06, which possess the functional property that the nucleic acid encoding the polypeptide 
is amplified in colon or lung tumors, would meet the requirement of 35 U.S.C. §112, first 
paragraph, as providing adequate written description. 

The specification describes methods for the determination of percent identity between 
two amino acid sequences. (See page 123, line 24 to page 125, line 14). In fact, the 
specification teaches specific parameters to be associated with the term "percent identity" as 
applied to the present invention. The specification further provides detailed guidance as to 
changes that may be made to a PRO polypeptide without adversely affecting its activity (page 
180, line 9 to page 183, line 8). This guidance includes a listing of exemplary and preferred 
substitutions for each of the twenty naturally occurring amino acids (Table 6, page 182). The 
specification describes methods for one of ordinary skill in the art to identify polypeptides 
having at least 80% identity to amino acid residues 35-273 of SEQ ID NO:506 wherein the 
nucleic acid encoding the polypeptide is amplified in lung tumors. Example 114 of the present 
application provides step-by-step guidelines and protocols for the gene amplification assay. 
Thus one of ordinary skill in the art would have understood at the time of filing what was 
encompassed by the claims. 
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The Examiner asserts that "the skilled artisan cannot envision the detailed chemical 
structure of an encompassed polypeptide, and therefore conception is not achieved until 
reduction to practice has occurred, regardless of the complexity or simplicity of the method of 
isolation" (Page 16 of the instant Office Action; emphasis in original). In support of this 
assertion, the Examiner cites the cases of Fiers v. Revel and Amgen v. Chugai. (Page 16 of the 
instant Office Action). 

Applicants submit that Fiers v. Revel and Amgen v. Chugai addressed conception and the 
written description requirement in the context of DNA-related inventions. The Amgen court held 
that conception of a DNA invention "has not been achieved until reduction to practice has 
occurred, i.e., until after the gene has been isolated." 927 F.2d 1200 (Fed. Cir.), cert, denied, 502 
U.S. 856 (1991), at 1206. The Fiers court extended this decision into the written description 
arena, holding that "[i]f a conception of a DNA requires a precise definition, such as by structure, 
formula, chemical name, or physical properties, as we have held, then a description also requires 
that degree of specificity." Fiers, 984 F.2d at 1 171. Since the instant claims are directed to 
polypeptides , Fiers and Amgen are distinguished on the facts and do not apply. 

More recently, in Enzo Biochem., Inc. v. Genprobe, Inc. 296 F.3d 1316 (Fed. Cir. 2002), 
the court adopted the standard that "the written description requirement can be met by 'showing 
that the invention is complete by disclosure of sufficiently detailed, relevant identifying 
characteristics, . . . i.e., complete or partial structure, other physical and/or chemical properties, 
functional characteristics when coupled with a known or disclosed correlation between function 
and structure, or some combination of such characteristics." Id. at 1324. While the invention in 
Enzo was still a DNA, the holding has been treated as being applicable to proteins as well. 
Indeed, the court adopted the standard from the USPTO f s Written Description Examination 
Guidelines, which apply to both proteins and nucleic acids. 

Accordingly, current applicable case law holds that biological sequences are not 
adequately described solely by a description of their desired functional activities. The instant 
claims meet the standard set by the Enzo court in that the claimed sequences are defined not only 
by functional properties, but also by structural limitations. It is well established that a 
combination of functional and structural features may suffice to describe a claimed genus. "An 
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applicant may also show that an invention is complete by disclosure of sufficiently detailed, 
relevant identifying characteristics which provide evidence that applicant was in possession of 
the claimed invention, i.e., complete or partial structure, other physical and/or chemical 
properties, functional characteristics when coupled with a known or disclosed correlation 
between function and structure, or some combination of such characteristics. " 3 As discussed 
above, Applicants have recited structural features, namely, 80% sequence identity to amino acid 
residues 35-273 of SEQ ID NO:506, which are common to the genus. The genus of claimed 
polypeptides is further defined by having a specific activity for the encoding nucleic acid, 
wherein the nucleic acid encoding the polypeptide is amplified in colon or lung tumors. 
Accordingly, a description of the claimed genus has been achieved. 

This particular combination of functional activity and structural homology, as disclosed 
in the specification, has been recognized by the USPTO as sufficient to describe a claimed genus 
of polypeptides . The Examiner's attention is respectfully directed to Example 14 of the Synopsis 
of Application of Written Description Guidelines issued by the U.S. Patent Office, which clearly 
states that protein variants meet the requirements of 35 U.S. C. §1 12, first paragraph, as providing 
adequate written description for the claimed invention even if the specification contemplates but 
does not exemplify variants of the protein if (1) the procedures for making such variant proteins 
are routine in the art, (2) the specification provides an assay for detecting the functional activity 
of the protein and (3) the variant proteins possess the specified functional activity and at least 
95% sequence identity to the reference sequence. 

As discussed above, the procedures for making the claimed variant polypeptides are well 
known in the art and described in the specification. The specification also provides an assay, 
shown in Example 114, for detecting the recited functional activity of the nucleic acids encoding 
the variant polypeptides. Finally, the claimed variant polypeptides possess both the specified 
functional activity and a defined degree of sequence identity to the reference sequence, amino 
acid residues 35-273 of SEQ ID NO:506. Accordingly, the claimed polypeptide variants meet 
the standards set forth in the Written Description Guidelines . 



3 M.P.E.P. §2163 11(A)(3)(a) 
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Thus the specification provides adequate written description for polypeptides having at 
least 80% identity to amino acid residues 35-273 of SEQ ID NO:506 wherein the nucleic acid 
encoding the polypeptide is amplified in lung tumors. Applicants therefore respectfully request 
that the Examiner reconsider and withdraw the written description rejection of Claims 58-62 and 
69-70 under 35 U.S.C. §1 12, first paragraph. 

V. Claim Rejections Under 35 U.S.C. §102 and 35 U.S.C. §103 

Claims 58-63 and 69 remain rejected under 35 U.S.C. § 102(e) as being anticipated by 
Holtzman et al (U.S. Published Patent Application 20020028508), with an effective priority date 
of April 23, 1998. In particular, the Examiner alleges that Holtzman et al disclose a protein that 
is 100% identical to the protein of SEQ ID NO:506. In addition, Claim 70 remains rejected 
under 35 U.S.C. § 103(a) as being unpatentable over Holtzman et al in view of Hopp et al 

Applicants have previously submitted a Declarations under 37 C.F.R. §1.131 by 
Dr. Goddard, Dr. Godowski, Dr. Gurney, Ms. Roy and Dr. Wood, that establish that Applicants 
had sequenced, cloned and homology human growth arrest-specific 6 (gas6) protein identified for 
the claimed polypeptides before April 23, 1998, which is earlier than the effective priority date 
of Holtzman et al 

The Examiner states that the Declaration filed on October 4, 2004 is unsigned. 
Applicants respectfully submit that copies of the Declaration signed by all of the inventors were 
submitted with the Preliminary Amendment filed May 23, 2005. 

The Examiner states that the Declaration of Goddard, Godowski, Gurney, Roy and Wood 
has been considered but is ineffective to overcome the Holtzman et al reference because the 
Holtzman et ah reference is a US patent application publication of an abandoned application 
which has a continuation that claims the same patentable invention. The Examiner states that if 
the reference and the instant application are commonly owned, the reference may be disqualified 
as prior art by an affidavit or declaration under 37 C.F.R. §1.130. The Examiner further states . 
that if the reference and the instant application are not commonly owned, the reference can only 
be overcome by establishing priority of invention through interference proceedings. 

Applicants submit that the reference and the instant application are not commonly owned, 
and thus the reference cannot be disqualified as prior art by an affidavit or declaration under 37 
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C.F.R. §1.130. The Declaration under 37 C.F.R. §1.131 establishes that Applicants had 
conceived and reduced to practice the invention before the effective filing date of the reference 
by Holtzman et al. Applicants agree that the priority of the invention can only be resolved 
through interference proceedings. Applicants respectfully request that this matter be held in 
abeyance pending the determination that the instant claims are patentable. 
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CONCLUSION 

In conclusion, the present application is believed to be in prima facie condition for 
allowance, and an early action to that effect is respectfully solicited. Should there be any further 
issues outstanding, the Examiner is invited to contact the undersigned attorney at the telephone 
number shown below. 

Although no fees are due, the Commissioner is hereby authorized to charge any fees, 
including any fees for extension of time, or credit overpayment to Deposit Account No. 08-1641 , 
referencing Attorney's Docket No. 39780-2630 P1C4 . Please direct any calls in connection with 
this application to the undersigned at the number provided below. 

Respectfully submitted, 

Date: November 18, 2005 By: 

Barrie D. Greene (Reg. No. 46,740) 

HELLER EHRMAN LLP 

275 Middlefield Road 
Menlo Park, California 94025 
Telephone: (650) 324-7000 
Facsimile: (650)324-0638 



SV 2166516 vl 

1 1/15/05 9:50 AM (39780.2630) 
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Abstract 

Early identification and subsequent intervention are 
needed to decrease the high mortality rate associated with 
lung cancer. The examination of bronchial epithelium for 
genetic changes could be a valuable approach to identify 
individuals at greatest risk. The purpose of this 
investigation was to assay cells recovered from 
nonmalignant bronchial epithelium by fluorescence in situ 
hybridization for trisomy of chromosome 7, an alteration 
common in non-small cell lung cancer. Bronchial 
epithelium was collected during bronchoscopy from 16 
cigarette smokers undergoing clinical evaluation for 
possible lung cancer and from seven individuals with a 
prior history of underground uranium mining. Normal 
bronchial epithelium was obtained from individuals 
without a prior history of smoking (never smokers). 
Bronchial cells were collected from a segmental bronchus 
in up to four different lung lobes for cytology and tissue 
culture. Twelve of 16 smokers were diagnosed with lung 
cancer. Cytological changes found in bronchial epithelium 
included squamous metaplasia, hyperplasia, and atypical 
glandular cells. These changes were present in 33, 12, and 
47% of sites from lung cancer patients, smokers, and 
former uranium miners, respectively. Less than 10% of 
cells recovered from the diagnotic brush had cytological 
changes, and in several cases, these changes were present 
within different lobes from the same patient Background 
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frequencies for trisomy 7 were 1.4 ± 0.3% in bronchial 
epithelial cells from never smokers. Eighteen of 42 
bronchial sites from lung cancer patients showed 
significantly elevated frequencies of trisomy 7 compared 
to never smoker controls. Six of the sites positive for 
trisomy 7 also contained cytological abnormalities. 
Trisomy 7 was found in six of seven patients diagnosed 
with squamous cell carcinoma, one of one patient with 
adenosquamous cell carcinoma, but in only one of four 
patients with adenocarcinoma. A significant increase in 
trisomy 7 frequency was detected in cytologically normal 
bronchial epithelium collected from four sites in one 
cancer-free smoker, whereas epithelium from the other 
smokers did not contain this chromosome abnormality. 
Finally, trisomy 7 was observed in almost half of the 
former uranium miners; three of seven sites positive for 
trisomy 7 also exhibited hyperplasia. Two of the former 
uranium miners who were positive for trisomy 7 
developed squamous cell carcinoma 2 years after 
collection of bronchial cells. To determine whether the 
increased frequency of trisomy 7 reflects generalized 
aneuploidy or specific chromosomal duplication, a 
subgroup of samples was evaluated for trisomy of 
chromosome 2; the frequency was not elevated in any 
of the cases as compared with controls. The studies 
described in this report are the first to detect and 
quantify the presence of trisomy 7 in subjects at risk for 
lung cancer. These results also demonstrate the ability to 
detect genetic changes in cytologically normal cells, 
suggesting that molecular analyses may enhance the 
power for detecting premalignant changes in bronchial 
epithelium in high-risk individuals. 

Introduction 

Although lung cancer is the leading cause of cancer death in the 
United States (1), early detection and intervention could de- 
crease the high mortality rate associated with this disease if 
sensitive screening approaches could be developed (2-4). Early 
detection may be feasible because the entire respiratory tract is 
exposed to inhaled carcinogens; therefore, the whole lung is at 
risk for developing multiple, independently initiated sites. This 
"field cancerization" condition (5) is supported clinically by a 
high frequency of second primary tumors in lung cancer pa* 
tients (6-9) and by the occurrence of progressive histological 
premalignant changes throughout the lower respiratory tract of 
cigarette smokers (10, 11). Moreover, recent studies using 
pathological tissues obtained after lung resection or autopsy 
have identified genetic aberrations associated with lung cancer 
in nonmalignant bronchial epithelium adjacent to tumors (12- 
16). 

Although examination of pathological samples is useful 
for identifying genetic changes associated with carcinogenesis, 
this invasive approach for collection of clinical samples nec- 
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essary for early detection would not be appropriate for screen- 
ing. However, bronchial epithelial cells harvested using routine 
clinical procedures could be examined for genetic changes as an 
initial approach for detecting individuals at high risk for lung 
cancer. This approach could also provide genetic markers for 
evaluating the effectiveness of chemoprevention regimens. 
Bronchoscopy provides direct access to viable cells within the 
airways and is a commonly used tool for obtaining samples 
from the lower respiratory tract; including bronchial epithelium 
(17). This procedure can $>e used to repeatedly sample the 
bronchial epithelium over time and to collect viable cells that 
can be expanded through tissue culture for functional assays. 

Because of" field cancerization, genetic abnormalities- 
should be dispersed throughout the bronchial epithelium of 
persons at risk for lung cancer. The purpose of this investiga- 
tion was to test this hypothesis by sampling nonmalignant 
bronchial epithelium from distinct locations within four differ- 
ent lobes of the lung from persons at risk for lung cancer and 
then assaying the bronchial cells for the presence of specific 
genetic abnormalities. Trisomy of chromosome 7 was exam- 
ined in these cells, because this alteration is common in solid 
tumors, including lung cancer* of several different organ sys- 
tems (18. 19). In addition, trisomy 7 has been detected in 
pretrial jgnant lesions such as villous adenoma of the colon (20), 
in the colonic mucosa of individuals with familial polyposis 
(21). and in the far margins of some resected lung tumors (22). 
Our results demonstrate that trisomy 7 can be detected in 
nonmalignant bronchial epithelium from patients with lung 
cancer distant to the site of the tumor and in individuals without 
tumors who are at high risk for lung cancer development. 
Together, these studies suggest that an extra copy of chromo- 
some 7 may be an intermediate biomarker of ongoing field 
carcinogenesis. 

Materials and Methods 

Subject Recruitment. Bronchial epithelium was collected 
from 16 cigarette smokers undergoing a diagnostic workup for 
possible lung cancer and from 7 individuals with a prior history 
of underground uranium mining. S of whom were also smokers. 
Three individuals who had never smoked were also recruited to 
obtain bronchial epithelium not exposed directly to either to- 
bacco smoke or radon progeny. 

Pathology and Exposure History. Twelve of the 16 cigarette 
smokers who underwent diagnostic bronchoscopy were diag- 
nosed with NSCLC/ Seven tumors were characterized histo- 
logically as SCCs, four tumors were ACs, and one tumor was 
an adenosquamous cell carcinoma. Lung cancer was not evi- 
dent in the other four subjects. Smoking histories ranged from 
15 to 120 pack-years (defined as the number of cigarettes 
smoked per day times the number of years smoked). All of the 
former uranium miners worked underground between 2 and 20 
years, with a range of 27-527 working level months. Five of the 
seven miners had smoking histories that ranged from 20-60 
pack-years. 

Bronchoscope Collection and Processing of Bronchial Ep- 
ithelium. A protocol was developed for harvesting viable 
bronchial epithelium from the lower respiratory tract using a 
standard cytology brush during bronchoscopy. After introduc- 



' The abbreviations used arc: NSCLC. non-small cell lung cancer. SCC. squa- 
mous cell cancer; AC. adenocarcinoma: KGFR. epidermal growth factor receptor. 
FISH, fluorescence in xitu hybridization: LOH. loss of heterozygosity: BEGM. 
Bronchial Epithelium Growth Medium. 



tion into the lower respiratory tract, the bronchoscope was 
directed into each upper and lower lobe, and the carinal margin 
of a segmental orifice, usually the second and third bifurcation 
within the upper and lower lobes, respectively, was brushed. 
These sites were chosen because (a) they are high-deposition 
areas for particles; {b) they are associated frequently with 
histological changes in smokers; and (c) they represent sites 
where tumors commonly occur (II, 23). The area was first 
washed with saline to remove any nonadherent cells. Sites were 
not brushed if a tumor was visualized within 5 cm of the site. 
After brushing, the brush was withdrawn, placed in serum-free 
medium, and kept on ice until processed. Each site was brushed 

twice. The procedure was well tolerated by all subjects, and no 

complications were noted related to the brushing procedure. 

Bronchial cells were collected from only two of the sites 
in two of the subjects, from three sites in two subjects, and from 
all four sites in the remaining subjects. Although only two sites 
were brushed initially in case I. ceils were obtained from all 
four sites in this subject during a repeat bronchoscopy per- 
formed after the initial procedure did not yield a diagnosis. 
Samples were obtained from all four sites in the cancer-free 
current smokers and in the never smokers. In addition, bron- 
chial epithelial cells derived at autopsy by Clonetics, Inc. (San 
Diego, CA) from four never smokers were also obtained to 
serve as additional controls. Only two sites sampled from most 
of the former uranium miners were available for analysts be- 
cause cells recovered from the other sites had been used ex- 
clusively for cytology in another investigation. 4 
Bronchial Epithelial Cell Culture. Replicative cultures of the 
bronchial epithelial cells obtained by the procedure described 
above were established in our laboratory (24) using a serum- 
free medium (BEGM: Clonetics, Inc.) that is optimal for growth 
of these cells. Cells were removed from brushes by vigorous 
shaking in BEGM: cells from one brush were prepared for 
cytological analyses, and cells from the other brush were 
washed, resuspended in BEGM. seeded onto 60-mm fibronec- 
tin-coated plates, and grown at 37°C in 3% C0 2 and 21% 0 2 
until 80% confluence. Prior to passage, aliquots of cells were 
cryopreserved and stored at -I45°C; other samples of cells 
were fixed in methanol-acetic acid (3:1). Next, the cells were 
washed four to six times in methanol :acetic acid and then 
dropped onto slides (about 2 X 10 5 cells/slide). The effects of 
cell culture on the frequency of trisomy 7 in nonmalignant 
bronchial epithelium were examined by placing cells dispersed 
from brushes directly onto microscope slides followed by fix- 
ation. 

Cytology. Cells from one brush from each bronchial collection 
site were prepared for cytological analysis by smearing the cells 
across a microscope slide. The cells were then fixed with 96% 
ethanol and stained according to the Papanicolaou procedure 
(25) to facilitate morphological evaluation by a cytopathologist. 
Detection of Trisomy 2 and Trisomy 7. Trisomy 2 and tri- 
somy 7 were determined by hybridization of cells with a bioti- 
nylated chromosome 2 or 7 centromere probe (Oncor; Gaith- 
ersburg, MD). The probes were denatured in hybridization 
buffer at 70*C for 5 min, and the slides were immersed in 70% 
formamide-2x SSPE at 70°C for 2 min. The probe was then 
applied to the slides, which were incubated in a humidified 
chamber at 37°C for 16 h. After incubation, the slides were 
washed in 0,25 X SSPE (10 mM sodium phosphate monobasic 
monohydrate; 1 mM ethylenediamine tetraacetic acid disodium 



4 Unpublished data. 
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Table I Frequency of trisomy 7 in bronchial epithelial cells from lung cancer 
patients 



Case 


Age 


Smoking 
(pack-yrs) 


Tumor 
diagnosis 


Brush 
location 


Cytological 
diagnosis 


Trisomy ' 
( frequency. 


1 


64 


104 


sec 


RLL" 


N 


2.8* 










RUL 


AGC 


4.0* 










RLL' 


N 


3.0* 










RUL r 


N 


4.0* 










LLL* 


N 


6.0* 










LUL' 


SM 


4.3* 


2 


69 


26 


sec 


RUL 


SM 


2.8* 










LLL 


SM 


3.3* 










LUL 


N 


3.8* 


3 


65 


120 


sec 


RLL 


AGC 


2.0 










RUL 


AGC 


2.3* 










LLL 


AGC 


2.0 


4 


52 


90 


AC 


RLL 


SM 


1.5 










RUL 


N 


1.8 










LLL 


SM 


1.5 










LUL 


SM 


1.8 


5 


70 


50 


sec 


RLL 


N 


1.5 










RUL 


N 


1.5 










LLL 


N 


1.5 










LUL 


SM 


1.3 


6 


61 


93 


AC 


RLL 


N 


1.5 










RUL 


N 


1.3 










LLL 


N 


2.0 










LUL 


N 


1.5 


7 


58 


40 


sec 


RLL 


N 


1.8 










RUL 


N 


2.3* 










LLL 


N 


2.5* 










LUL 


N 


2.8* 


8 


59 


120 


AdSCC 


RLL 


N 


1.5 










RUL 


N 


2.0 










LLL 


N 


2.5* 










LUL 


AGC 


2.0 


9 


65 


71 


sec 


RLL 


SM 


2.0 










RUL 


SM 


2.5* 


10 


63 


45 


AC 


RLL 


N 


1.0 










RUL 


N 


1.8 










LLL 


N 


1.8 










LUL 


N 


1.3 


11 


61 


95 


AC 


LLL 


N 


2.5* 










LUL 


N 


2.8* 


12 


76 


17 


sec 


RLL 


N 


2.0 










RUL 


N 


2.3* 










LLL 


N 


2.3* 










LUL 


N 


2.3* 



salt, di hydrate; ISO mM sodium chloride. pH 7.4) for 5 min at 
72°C, and the probe was detected with fluorescein-labeled 
avidin. Cell nuclei were visualized with propidium iodide. 
Data Analysis. The number of ceniromeric hybridization sig- 
nals in each cell were evaluated in 400 cells/slide, and the 
frequency of trisomy 7 on each slide was calculated by dividing 
the total number of cells expressing three hybridization signals 
by the total number of cells counted on each slide. Twenty % 
of the slides were scored by a second person, and frequencies 
for trisomy 7 differed by <0.4%. The total number of sites 
positive for trisomy 7 in subjects with SCC and AC were 
- compared using Fisher's exact test. 

Results 

Cytology. Squamous metaplasia and atypical glandular cells, 
the only cytological abnormalities observed in lung cancer 
patients, were present in 32% of the samples (Table 1). These 
cytological changes were observed in <10% of the cells re- 
covered from the diagnostic brush. Two subjects had three sites 
with cytological abnormalities, and five subjects had no cyto- 
logical abnormalities. No samples contained tumor cells by 
cytology, although one of four sites in five subjects was col- 
lected from the same lobe where a tumor was later diagnosed. 

Two of the 16 sites in smokers without lung cancer were 
cytologically abnormal (both in the same person; Table 2). 
whereas no atypical cells were present in the 12 sites from the 
three never smokers (Table 3). In former uranium miners, 
hyperplasia was present in bronchial cells collected from all 
four sites from one person, and in one site in two additional 
people (Table 2). 

Culturing of Bronchial Epithelial Cells. The efficiency of 
establishing replicative cultures of the cells obtained by bron- 
chial brushing was 100%. The serum-free medium used for 
these cultures is optimal for growing bronchial epithelial cells 
and does not support fibroblastic cell replication (25). There- 
fore, the cells were uniformly epitheloid in appearance. Growth 
potential was evaluated by passaging cells from all seven of the 
uranium miner cases and cases 1-6 from the lung cancer 
patients. Some of these cultures were maintained for up to nine 
passages (a minimum of 16 population doublings), and many 
underwent 30 divisions before senescence. However, none ex- 
hibited an indefinite population-doubling potential. 
Detection of Trisomy 7 in Nonmalignant Bronchial Epithe- 
lium. Background rates of trisomy 7 were determined by ex- 
amining normal human bronchial epithelial cell lines derived 
from autopsy cases of never smokers and bronchial epithelium 
collected from never smokers during bronchoscopy. In bron- 
chial cell lines (passage 2) from four donors and bronchial 
epithelial cell samples obtained by bronchial brushing from the 
recruited never smokers (Table 3), only 1.4 ± 0.3% (SD) of the 
cells contained three hybridization signals for chromosome 7 
with values ranging from 1 to 1.8%. These values agree with 
those reported by the manufacturer of the probe. Therefore, 
trisomy 7 frequencies of >2.0% (>2 SD above the mean for 
controls) were considered significantly different from controls. 

Passage 1 or 2 bronchial cells from lung cancer patients 
were examined for trisomy 7. Eighteen of the 42 bronchial sites 
(43%) sampled from the 12 lung cancer patients contained 
trisomy 7 at frequencies ranging from 2.3 to 6.0% (Table I ; Fig. 
1 ). Three subjects (cases 1. 2. and II) displayed trisomy 7 in all 
sites collected during bronchoscopy, and in two subjects (cases 
7 and 12), trisomy 7 was found in three of four sites (Table I). 
Six of the 1 8 sites positive for trisomy 7 also contained cyto- 
logically abnormal cells. Trisomy 7 was found in six of seven 



° RLL. right lower lobe: RUL. right upper lobe; LLL. left lower lobe; LUL, left 

upper lobe; AGC. atypical glandular cells; SM. squamous metaplasia; N. normal 

cells; AdSCC. adenosquamous carcinoma. 

b P < 0.05 as compared to never-smoker controls. 

*' Resampled 4 months later. 



patients diagnosed with SCC, whereas only one of four patients 
with AC displayed trisomy 7 in any site collected at bronchos- 
copy. Case 7, which had histological features of both SCC and 
AC, had one site positive for trisomy 7. The frequency of 
positive trisomy 7 sites in all patients with SCC within this 
small sample population was significantly greater than in AC 
patients {P < 0.005). 

The reproducibility of detecting trisomy 7 at sites found to 
be positive for this abnormality was investigated in one patient 
(case 1) who required repeat bronchoscopy for clinical reasons. 
Trisomy 7 was increased similarly in the two sites brushed 
during both procedures, although cytological examination 
showed atypical cells in one site from the first bronchoscopy 
and cytologically normal cells from the same site collected 
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Table 2 Frequency of trisomy 7 in bronchial epithelial cells from cancer* free 



smokers and former uranium miners ^ 


Case 




^ mnl? i fi a 

tJinuKiiije 


Radon exposure 


Brush 


Cytologies! 


Trisomy 7 




Age 


(pack-yrs) 


(WLMsl" 


location 


diagnosis 


(frequency, %) 


13 


81 


15 


0 


RLL 


N 


1.8 










RUL 


AGC 


1.5 










LLL 


N 


1.8 










LUL 


SM 


2.0 


M 


34 


24 


0 


RLL 


N 


1.3 








1 


RUL 


N 


1.3 










LLL 


N 


1.0 










LUL 


-N 


1.3 


15 


68 


51 


0 


RLL 


■ - N 


— 4.0" " 










RUL 


N 


3.0" 










LLL 


N 


4.3" 










LUL 


N 


3.5" 


16 


45 


30 


0 


RLL 


N 


1.3 










RUL 


N, 


1.5 










LLL 


N 


2.0 










LUL 


N 


1.8 


17 


59 


8 


27 


LLL 


N 


3.0" 










LUL 


N 


3.0" 


18 


65 


9 


516 


LUL 


N 


1.3 










RUL 


N 


3.3" 


19 


64 


30 


235 


LUL 


N 


1.5 










RLL 


N 


1.0 


20 


56 


0 


186 


LUL 


N 


2.0 










RLL 


N 


2.3" 


21 


64 


0 


214 


RLL 


H 


1.8 


tt 


64 


9 


577 


LUL 


N 


1.8 










RLL 


H 


0.8 


23 


67 


31 


124 


LLL 


H 


1.3 










LUL 


H 


2.8" 










RLL 


H 


2.5" 










RUL 


H 


3.3" 



" Abbreviations arc as indicated in Tabic 1 footnote. WLM. working level month: 
H. hyperplasia. 

* P < 0.05 as compared to nevcr-smokcr controls. 



during the second procedure (Table 1). The other two sites 
collected during the second bronchoscopy also showed elevated 
frequencies of trisomy 7 in this patient. 

Trisomy 7 was detected in cytologically normal bronchial 
epithelium collected from four sites in one (case 15) of the 
cancer-free smokers (Table 2). Bronchial cells from the other 
smokers did not contain this chromosome abnormality. In the 
former uranium miners (cases 17-23), seven of 15 sites col- 
lected during bronchoscopy were positive for trisomy 7. Three 
of the positive sites were found in one subject (case 23) and also 
contained basal cell hyperplasia. However, the other four sam- 
ples positive for trisomy 7 showed no cytological abnormality. 

Two of the former uranium miners (cases 18 and 23) 
developed lung cancer within 2 years of bronchial cell collec- 
tion. SCC was diagnosed in the right upper lobe of both sub- 
jects. As noted in Table 2, both cases were positive for trisomy 
7 in the right upper lobe brushing site obtained at the initial 
bronchoscopy. 

Tissue Culture Effects on Trisomy 7 Expression in Bron- 
chial Epithelium. The effect of tissue culture on trisomy 7 
frequency was assessed by comparing the frequency of this 
chromosome abnormality in freshly isolated bronchial epithe- 
lium obtained directly from bronchial brushes ("preculture") to 
passage 1 cells. This comparison was conducted on cells col- 
lected from two different bronchial sites in three different 
subjects [(cases 1 1 and 16 and donor 7 (never smoker)]. Cul- 
tured samples positive for trisomy 7 in case 1 1 were also 



Table 3 Interphase analysis of chromosome 7 in normal human bronchial 
epithelial cells 

Bronchial epithelial cell lines were established from never smokers (Clone tics) 
after autopsy and from volunteers. The normal distribution of chromosome 7 copy 
number as detected by FISH is shown by the percentage of cells exhibiting 1 . 2, 
3. or 4 hybridization signals. Four hundred cells containing hybridization signal 
were counted per donor. 



Donor 


Age 


Brush 
location 




Number of hybridization 
signals/cell (*) 








2 


3 


4 


I 


6 


NA- 


3.5 


92.0 


1.5 


3.0 


2 


17 


NA 


2.3 


95.5 


1.3 


1.0 


3~ 


15 


NA 


1.5 


94.7 


" 1.8 


2.0 


4 


41 


NA 


2.0 


94.8 


1.0 


2.3 


5 


45 


RLL 


1.0 


95.5 


1.8 


1.7 






RUL 
LLL 


0.5 
1.3 


98.3 
96.5 


1.0 
1.0 


0.2 
1.2 






LUL 


1.0 


96.3 


1.2 


1.5 


6 


35 


RLL 


1.0 


96.8 


1.0 


1.2 






RUL 


2.5 


93.3 


1.7 


2.5 






LLL 


2.0 


94.8 


1.5 


1.7 






LUL 


1.8 


94.2 


1.8 


2.2 


7 


33 


RLL 


0.5 


98.2 


0.8 


0.5 






RUL 


0.5 


97.2 


1.3 


1.0 






LLL 


1.2 


96.8 


1.3 


0.7 






LUL 


1.0 


96.0 


1.5 


1.5 



" Abbreviations are as indicated in the legend to Table I . NA. not applicable. 



positive in preculture cells from the same bronchial collection 
site, whereas sites negative for trisomy 7 in cultured cells from 
case 16 and the never smoker were also negative in preculture 
cells (data not shown). Values for trisomy 7 differed by <0.3% 
between preculture and cultured cells. The effect of passaging 
cells on the frequency of trisomy 7 was also examined in 
bronchial cells from case 1. Trisomy 7 frequency was similar in 
cells from passages 1, 4, and 7. 

Frequency of Trisomy 2 in Nonmalignant Bronchial Epi- 
thelium. Aneuploidy has been detected in bronchial squamous 
metaplasia, a likely precursor to SCC (26). To determine 
whether the increased frequency of trisomy 7 detected in the 
current study reflects generalized aneuploidy or a specific chro- 
mosomal duplication, a subgroup of samples was evaluated for 
trisomy of chromosome 2. The frequency of trisomy 2 in never 
smokers was 1.5 ± 0.4% (data not shown). Bronchial cells 
from eight subjects, six of whom had elevated frequencies for 
trisomy 7, were evaluated. The frequency for trisomy of chro- 
mosome 2 did not differ from never smokers (Table 4). 

Discussion 

The studies described in this report are the first to detect and 
quantify an increase in trisomy 7 in the airway cells of subjects 
at risk for lung cancer. The presence of trisomy 7 appeared to 
be a specific chromosome gain and not due to generalized 
aneuploidy in these cells. In addition, trisomy 7 in nonmalig- 
nant epithelium from lung cancer patients was associated with 
SCC tumor histology, suggesting that patients with this genetic 
change may be at greater risk for developing SCC than other 
histological forms of lung cancer. This supposition was sup- 
ported by the fact that two cancer- free former uranium miners 
with bronchial cells positive for trisomy 7 ultimately developed 
SCC. Finally, these results demonstrate the ability to detect 
genetic changes in cytologically normal cells, suggesting that 
molecular analyses may enhance the power for detecting 
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Fig. /. FISH for chromosome 7 in 
bronchial epithelial cells. Trisomy 7 
is apparent in one cell from this 
field. Magnification. x530. 



Table 4 Frequency of trisomy 2 in bronchial epithelial cells from lung cancer 
patients, cancer- free smokers, and former uranium miners 



Case Tumor diagnosis 


Brush location 


Trisomy 2 
(frequency, t ) 


1 sec 


RLL" 


1.5 




RUL 
LLL 


1.8 
1.8 




LUL 


1.0 


2 sec 


LLL 


1.0 




LUL 


1.0 


7 sec 


RLL 


1.5 




RUL 
LLL 


2.1 
1.8 




LUL 


1.5 


8 AC 


RLL 


0.3 




RUL 
LLL 


1.5 
0.8 


1.1 None 


RLL 


1.0 




RUL 


0.8 




LLL 


1.0 




LUL 


1.3 


15 None 


RLL 


1.8 




RUL 


2.0 




LLL 


1.0 




LUL 


1.3 


11 None 


LUL 


1.9 




RLL 


0.8 


23 None 


RLL 


1.5 


" Abbreviations arc us indicated in legend to Table 1. 



premalignant changes in bronchial epithelium in high-risk 
individuals. 

Cigarette smoking and the exposure of underground min- 
ers to radon progeny are both well-established respiratory car- 
cinogens (27. 28). Tobacco smoke contains numerous muta- 
gens and carcinogens, and radon progeny that have been 
inhaled and deposited on the respiratory epithelium release or 



particles capable of damaging DNA (28). Although comparison 
between findings in the cigarette smokers and the former ura- 
nium miners is constrained by the number of participants in the 
two groups, trisomy 7 was found in both groups. These results 
are consistent with the synergism between smoking and radon 
progeny, which suggests commonality in the pathways by 
which the two carcinogens cause lung cancer (29). 

The bronchial brushing method used for collecting cells 
from the lower respiratory tract is rapid (10-12 min total for 
two brushes at four different sites), well tolerated by the patient, 
and permits collection of viable bronchial cells that can be 
expanded through tissue culture at 100% efficiency. The sta- 
bility of these cells in culture was evident by the fact that the 
frequency of trisomy 7 did not differ between primary brush 
cells and cells propagated for up to seven passages. Further- 
more, this procedure is amenable to the production of sufficient 
cell numbers (I X I0 K ) at low passage (one or two) to accom- 
modate multiple molecular analyses. Although the media used 
in culturing of bronchial epithelial cells did not appear to 
provide a selective growth advantage to cells harboring an 
additional chromosome 7, the modulation of medium supple- 
ments might lead to the establishment of clonal populations of 
premalignant cells. Such cell populations would greatly facil- 
itate the identification of additional early gene changes in 
respiratory carcinogenesis. 

The detection of trisomy 7 in multiple nonmalignant sites 
within the bronchial tree supports the theory of field cancer- 
ization (5), which states that diffuse exposure of the entire 
respiratory tract to inhaled carcinogens causes the development 
of multiple, independently initiated sites that can lead to tumor 
development. Although the frequency of this chromosome ab- 
normality was relatively low (2.3-6.0%). these values were 
consistent with the low percentage of cells within each brush 
sample (10%) that exhibited abnormal cytology. These results 
are also similar to studies of chromosome gain in patients with 
head and neck cancer where trisomy 7 was detected at frequen- 
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cies of 2, 3, and 2 1 % in histologically normal, hyperplastic, and 
dysplaslic ce\h\ respectively (30). 

The detection of trisomy 7 in normal, hyperplastic, and 
metaplastic bronchial epithelium from cancer- free patients ex- 
tends a recent report describing LOH at chromosomes 3p, 5q, 
and 9p in dysplastic premalignant bronchial lesions harvested 
from current and former smokers by bronchoscopy (31). The 
inability to detect LOH at these chromosome loci in normal or 
early premalignant epithelium may stem from a difference in 
sensitivity between the methpdologies used. The low frequency 
of trisomy 7 and cytologically abnormal cells collected from 
bronchoscopy is consistent with a lack of clonality within the 
brush cells. FISH assays on interphase cells permit screening of 
individual cells, and sensitivity for detection is limited only by 
the number of cells examined. In contrast, microsatellite anal- 
yses for LOH cannot detect nonclonal changes but require that 
the chromosome alteration be present in approximately 40- 
50% of the sample (32, 33). 

The role of trisomy 7 in lung cancer development has not 
been elucidated. Increased expression of EGFR, which is lo- 
cated on chromosome 7 (34), is observed in 50-80% of 
NSCLCs ( 16, 35, 36). EGFR expression appears greater in SCC 
than AC (35. 36) and is amplified in some cell lines derived 
from SCC (37). These findings corroborate our hypothesis that 
acquisition of trisomy 7 in bronchial epithelium could be prog- 
nostic for development of SCC. Moreover, expression of this 
gene is also increased in non malignant bronchial epithelium 
from NSCLC patients (16. 35) and in normal or premalignant 
epithelium adjacent to head and neck tumors (38). Thus, altered 
expression of EGFR could enable cells that have acquired 
additional genetic changes to proliferate continually and escape 
from terminal differentiation (39). In addition, the c-met onco- 
gene is also located on chromosome 7 and is overexpressed in 
NSCLCs (40. 41). This oncogene encodes a transmembrane 
tyrosine kinase (42) that functions as a receptor for the hepa- 
tocyte growth factor (43) and is involved in sustaining the 
growth of NSCLC cells in culture (44). 

Previous studies have detected mutations in p53 (12. 14, 
35), chromosome losses at 9p2l (45) and 3p (46) in preinvasive 
bronchial lesions, and simple chromosome rearrangements in 
normal bronchial epithelium from proximal airways (47) of 
lung cancer patients. The prevalence of these genetic changes in 
normal epithelium from persons at risk for lung cancer should 
be quantified by FISH to define the temporal sequences of 
somatic genetic changes that precede the development of clonal 
lesions in the lung. This information will be invaluable in 
providing biological markers that can qualitatively estimate the 
extent of field cancerization in persons at risk for lung cancer 
and can be used to assess the efficacy of chemo intervention 
trials. Ultimately, the efficiency for detecting these biological 
markers in bronchial epithelium versus exfoliated epithelial 
cells within sputum must be established to support the use of a 
"genetic-based" screening approach for individuals at high risk 
for lung cancer. The results of the current investigation have 
identified one potential biomarker. trisomy 7. that may be 
useful in early detection and intervention for lung carcino- 
genesis. 
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Paul a. Hayn« Proteome analysis: Biological assay or data archive? 
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^Mk^h*** h In ^ rev " w we examiflc current state of proteome analysis. There ere 

Rue* Aeaersoia three main issues discussed: why it is necessary to study proteomes; how pro- 

4 t ha i i teomes can be analyzed with current technology; and how proteome analysis 

^ r c P"r^f f °\"f~* r , can be used to enhance biological research. We conclude that proteome anal- 

Biotechnology, iwvenflty of ysis is an essential tool in the understanding of regulated biological systems. 

Washington, beanie, WA, u&A Quo&a technology, while still mostly limited to the more abundant proteins, 

enables the use of proteome analysis both to establish databases of proteins 
present, and to perform biological assays Involving measurement of multiple 
variables. We believe that the utility of proteome analysts, in future biological 
research will continue to be enhanced by further improvements in analytical 
technology. F 

Contents resolution two-dimensional gel electrophoresis (2-DH), 

i TfttMrfHriiiM. ifto detected in the gel and identified by their amino acid 

\ iteiin?*^ sequence. The ease, sensitivity and speed with which gel- 

7. "P*** P roteins « b * h * * e <>f recently 

™tn P * ^vclppcd mass spectrometry techniques have framati- 

2.2 caHyincre^^ 

cessed «ji«uuw«i/ uiuuuiw «iu y§v~ ofthe most attractive features of such analyses is that com- 
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1 Introduction 

A proteome has been defined as the'protein complement 
expressed by the.genome of an organism, or, in multicel- 
lular organisms, as the protein complement expressed by a 
tissue or differentiated cell fll In the most common im- 
plementation of proteome analysis (he proteins extracted 
from the celt or tissue analyzed are separated by high 

Concspoafenee: Professor Ruedi Acbeixold, Deputment of Molecular 
Biotechoology, Unjverrhy or WMhlngtoa, Box 357730, Seattle, WA, 
98195, USA (Tel: +20$-*M-4235; +30^85^392; taall: modi 
Ou.waffttaftoo.edt>) 

Abbrerbaoitr: ao, coUirion -induced dissocUtion; MS/MS, Uu&tm 
mm iptctrocnctrn SAGE, «rUl ifulysis of gene expros»loa 

Keywords: proteome / T^o-duneosiooi) poljracr]rltnitd& get electro- 
phoresis / Tkodera mas? spectrometry 



2 Rationale for proteome analysis 

The dramatic growth in both the number of genome 
projects and the speed with which genome sequences 
are being determined has generated huge amounts of 
sequence information, for some species even complete 
genomic sequences ({15-17}). The description of the 
state of a biological system by the quantitative measure- 
ment of system components has long been a primacy 
objective in molecular biology. With recent technical 
advances including the development of differential dis- 
play-PCR [18], cONA microarray and DMA chip techno- 
logy [19, 20] and serial analysis of gene expression 
(SAGE) [21, 221, it is now feasible to establish global and 
quantitative mRNA expression maps of cells and tissues, 
in which the sequence of aU the genes is known, at a 
speed and sensitivity which is not matched by current 
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protein analysis technology. Given the long-standing 
paradigm in biology that DNA synthesizes RNA which 
synthesizes protein, and the ability to rapidly establish 
global, quantitative mRNA expression maps, the ques* 
Hons which arise are why technically complex proteome 
projects should be undertaken and what specific types of 
information could be expected from proteome projects 
which cannot be obtained from genomic and transcript 
.profiling projects. We see three main reasons for pro- 
teome analysis to become an essential component in the 
comprehensive analysis of biological systems, (i) Protein 
expression levels are not predictable from the mRNA 
expression levels, (it) proteins are dynamically modified 
and processed in ways which are not necessarily 
apparent from the gene sequence, and (iti) proteomes 
are dynamic and reflect the state of a biological system. 

2.1 Correlation between mRNA and protein expression 
levels 

Interpretations of quantitative mRNA expression profiles 
frequently implicitly or explicitly assume that for. specific 
genes the transcript levels are indicative of the levels of 
protein expression. As part of an ongoing study to our 
laboratory, we have determined the correlation of expres- 
sion at the mRNA and protein levels for a population of 
selected genes in the yeast Saccharomyces cerevtstae 
growing at mid-log phase (S, P. Gygi et al % submitted for 
publication). mRNA expression levels were calculated 
from published SAGE frequency tables (22). Protein 
expression levels were quantified by metabolic radiola- 
bcling of the yeast proteins, liquid scintillation counting 
of the protein spots separated by high resolution 2-DB 
and mass spectrometry identification of the protein(s) 
migrating to each spot. The selected 80 samples consti- 
tute a relatively homogeneous group with respect to pre- 
dicted half-life and expression level of the protein pro- 
ducts. Thus far, we have found a general trend but no 
strong correlation between protein and transcript levels 
(Pig. 1). For some genes studied equivalent mRNA trans- 
cript levels translated into protein abundances which 
varied by more than 50-fold. Similarly, equivalent steady- 
state protein expression levels were maintained by trans- 
cript levels varying by as much as 40-fold (S. P. Gygi 
et cl. t submitted)* These results suggests that even for a 
population of genes predicted to be relatively homoge- 
neous with respect to protein half-life and gene expres- 
sion, the protein levels cannot be accurately predicted 
from the level of the corresponding mRNA transcript 

IX Proteins are dynamically modified aad processed 

In the mature, biologically active form many proteins are 
post-translstionally modified by glycosytatton, phosphor- 
ylation, prenylation, acylation, ubiquitination or one or 
more of many other modifications [23] and many pro- 
teins are only functional if specifically associated or com- 
plexed with other molecules, including DNA, RNA, pro- 
teins and organic and inorganic cofactors. Frequently, 
modifications are dynamic and reversible and may alter 
the precise three-dimensional structure and the state of 
activity of a protein. Collectively, the state of modifica- 
tion of the proteins which constitute a biological system 



iqoco* vm 



I- 



9 * t " • . a 

ii i . :; ! 



.* * * 
[»,* * 

Mi ? * , 



• » too ua no . t» «ot 
aANAlMftlpoplttfetfQ' 

Fiturt i. Condition between mRNA tad protein letd* to yeast cells. 
For i idecled population of 80 gene*, -protein levels were measured 
by M '$-re<tioUbetinc and mRNA lerels wen calculated from publi- 
shed SAGE tables. Inset: expanded view of (he low abundance rectoa. 
Pot more experiment*! details, also sea Fifii. 5 and 6, (S. P. Ojrgi et <si, 
submitted). 



are important indicators for the state of the system. The 
type of protein modification and the sites modified at a 
specific cellular state can usually not be determined 
from the gene sequence alone. 

23 Proteomes are dynamic and reflect the state of a 
biological system 

A single genome can give rise to many qualitatively and 
quantitatively different proteomes. Specific stages of the 
ceil cycle and states of differentiation, responses to 
growth and nutrient conditions, temperature and stress, 
and pathological conditions represent cellular states 
which are characterized by significantly 'different pro* 
teomes. The proteome, in principle, also reflects events 
that are under translations! and post-translattonal con* 
trot It is therefore expected that proteomics will be able 
to provide the most precise and detailed molecular des- 
cription of the state of a cell or tissue, provided that the 
external conditions defining the state are carefully deter- 
mined. In answer to Che question of whether the study 
of proteomes is necessary for the analysis of biomotec- 
ular systems, h is evident that the analysis of mature pro* 
tein products in cells is essential as there are numerous 
levels of control of protein synthesis; degradation, 
processing and modification, which are only apparent by 
direct protein analysis. 



3 Description and assessment of current proteome 
analysis technology 

3,1 Technical requirements of proteome technology 

In biological systems the level of expression as well as 
the states of modification, processing and macro-molec- 
ular association of proteins are controlled and modii- 
lated depending oa the state of the system. Comprehen- 
sive analysis of the identity, quantity and state of modifi- 
cation of proteins therefore requires the detection and 
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quantitation of the proteins which constitute the system, 
and analysis of differentially processed forms* There are 
a number of inherent difficulties in protein analysis 
which complicate these tasks. First, proteins cannot be 
amplified. It is possible to produce large amounts of a 
particular protein by over-expression in specific cell sys- 
tems* However, since many proteins are dynamically 
post-translationally modified, they cannot be easily am- 
plified in the form In which Ihey finally function in the 
biological system,. It is frequently difficult to purify from 
the native source sufficient amounts of a protein for 
analysis. From a technological point of view this trans- 
lates into the need for high sensitivity , analytical .tech- 
niques. Second, many proteins are modified and pro- 

. cessed post-translationally. Therefore, in addition to the 
protein identity, the structural basis for differentially 
modified isoforms also needs to be determined. The dis- 
tribution of a constant amount of protein over several 
differentially modified isoforms further reduces the 
amount of each species, available for analysis. The com- 
plexity and dynamics of post-translational protein edit- 
ing thus significantly complicates proteome studies. 
Third, proteins vary dramatically with respect to their 
solubility in commonly used solvents. Tbere are few, if 
any, solvent conditions in which all proteins are soluble 
and which are also compatible with protein analysis. This 
makes the development of protein purification methods 
particularly difficult since both protein purification and 
solubility have to be achieved under the same condi- 
tions. Detergents, in particular sodium dodecyl sulfate 
(SDS), are frequently added to aqueous solvents to 
maintain protein solubility. The compatibility with SDS 
is a big advantage of SDS polyacryUunide gel electro-' 
phoresis (SDS-PAGE) over other protein separation 

"techniques. Thus, SDS-PAOE and two-dimensional gel 
electrophoresis, which also uses SDS and other deter- 
gents, are the most general and preferred methods for 
the purification of small amounts of proteins, provided 
that activity does not necessarily need to be maintained. 
Lastly, the number of proteins in a given cell system is 
typically in the thousands. Any attempt to identify and 
categorize all of these must use methods which are as 
rapid as possible to allow completion of the project 

. within a reasonable time frame. Therefore, a successful, 
general proteomics technology requires high sensitivity, 
high throughput, the ability to differentiate differentially 
modified proteins, and the ability to quantitatively dis- 
play and analyze all the proteins present in a sample. 

3*2 2-D electrophoresis — mass spectrometry: a common 
implementation of proteome analysis 

The most common currently used implementation of 
•proteome analysis technology is based on the separation 
of proteins by two-dimensional (IBP/SDS-PAOfi) gel 
electrophoresis and their subsequent identification and 
analysis by mass spectrometry (MS) or tandem mass 
spectrometry (MS/MS). In 2-DB, proteins are first separ- 
ated by isoelectric focusing (IBF) and then by SDS- 
PAGE, in the second, perpendicular dimension. Separ- 
ated proteins are visualized at high sensitivity by staining 
or autoradiography, producing two-dimensional arrays of 
proteins. 2-DB gels are, at present, the most commonly 
used means of global display of proteins in complex 



samples. The separation of thousands of proteins has 
been achieved in a single gel £4, 25) and differentially 
modified proteins are frequently separated. Due to the 
compatibility of-2-DB with high concentrations of deter- 
gents, protein denaturants and other additives promoting 
protein solubility, the technique is widely used. 

The second step of this type of proteome analysis is the 
identification and analysis of separated proteins, individ- 
ual proteins from polyacryiamlde gels have traditionally 
been identified using //-terminal sequencing [26, 27), 
internal peptide sequencing (28, 29|, inununoblotting or 
-comlgration with known proteins (30). The recent dra- 
matic growth of large-scale genomic and expressed 
sequence tag (EST) sequence databases has resulted itfca 
fundamental change in the way proteins are identified fy 
their amino add sequence. Rather than by the tradition*! 
methods described above, protein sequences are now fre- 
quently determined . by correlating mass spectral or 
tandem mass spectral data of peptides derived from pro- 
teins, with the information contained in sequence data- 
bases pl-33). 

There are a number of alternative approaches to pro- 
teome analysts currently under development. There is 
considerable interest in developing a proteome analysis 
stragegy which bypasses 2-DB altogether, because it is 
considered a relatively slow and tedious process, and 
because of perceived difficulties in extracting proteins 
from the gel matrix for analysis. However, 2-DE as a 
starting point for proteome analysis has many advan- 
tages compared to other techniques available today. The 
most significant strengths of the 2-DB-MS approach 
include the relatively uniform behavior of proteins in 
gels, the ability to quantify spots and the high resolution 
and simultaneous display of hundreds to thousands of 
proteins within a reasonable time frame. 

A schematic diagram of a typical procedure of the identi- 
fication of gel-separated proteins is shown in Fig. 2. Pro- 
tein spots detected in the gel are enzymatically or chemi- 
cally fragmented and the peptide fragments are Isolated 
for analysis, as already indicated, most frequently by MS 
or MS/MS. There are numerous protocols for the gener- 
ation of peptide fragments from gel-separated proteins. 
They can be grouped into two categories, digestion in 
the gel slice (28, 34] or digestion after electro transfer out 
of the gel onto a suitable membrane Q29, 35-37] and 
reviewed in J38j). In most instances either technique is 
applicable and yields good -results. The analysis of MS or 
MS/MS data is an important step in the whole process 
because MS instruments can generate an enormous 
amount of information which cannot easily be managed 
manually. Recently, a number of groups have developed 
software system? dedicated to the use of peptide MS 
and MS/ MS spectra for the identification of proteins. 
Proteins are Identified by correlating the information 
contained in the MS spectra of protein digests or 
MS/MS spectra of individual peptides with data con- 
tained in DNA or protein sequence databases. 

The systems we are currently using in our laboratory are 
based on the separation of the peptides contained in pro- 
tein digests by narrow bore or capillary liquid chromatog- 
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figure 2. Schemtlic diagram of * procedure Tor identification of (d- 
scpaiated proteins. Peptides can either be separated fay a technique 
such as LC 01 CE, or Infused u ■ mixture sod sotted in the MS. Dit* 
sue searching cm either be performed on peptide masses from «a 
MS spectrum, peptide fragment masses from CID spectra of peptides, 
or a combbuUon of both. 



raphy P9 r 40] or capillary electrophoresis (41], Ibe anal- 
ysis of the separated peptides by electrospray ioniza- 
tion (ESI) MS/MS, and tbe correlation of the generated 
peptide spectra with sequence databases using the 
SEQUBST program developed at the University of Wash- 
ington [32, 33). The system automatically performs tho 
following operations: a particular peptide ion character- 
ized by its mass-to-charge ratio is selected in the MS out 
of all the peptide ions present in the system at a parti- 
cular time; the selected peptide ion is collided in a colli* 
sion ceil with argon (collision-induced dissociation, 
CID) and the- masses of the resulting fragment ions are 
determined in the second sector of the tandem MS; this 
experimentally determined CID spectrum is then corre- 
lated with the CID spectra predicted from all the pep- 
tides in a sequence database which have essentially the 
same mass as the peptide selected for CID; this correla- 
tion matches the isolated peptide with a sequence seg- 
ment in a database and thus identifies the protein from 
which the peptide was derived. There are a number of 
alternative programs which use peptide CID spectra for 
protein identification, but we use the SEQUBST system 
because it is currently the most highly automated pro- 
gram and has proven to be successful, versatile and 
robust. 

33 Protein Identification by LC-MS/MS, capillary 
LC-MS/MS and CE-MS/MS 

II has been demonstrated repeatedly that MS has a very 
high intrinsic sensitivity. For tbe routine analysis of gel- 
separated proteins at high sensitivity, the most signif- 
icant challenge is the handling of small amounts of 
sample. The crux of the problem is the extraction and 
transferal of peptide mixtures generated by the digestion 
of low nanogram amounts of protein, from gels into the 
MS /MS system without significant loss of sample or 
introduction of unwanted contaminants. We employ 
three different systems for introducing gel-purified sam- 
ples into an MS, depending on the level of sensitivity 



required. As an approximate guideline, for samples con* 
taining tens of picomoles of peptides, LC-MS/MS is 
most appropriate; for samples containing tow picomole 
amounts to high fern to mole amounts we use capillary 
' LC-MS/MS; and for samples containing femlomoles or 
; less, CE-MS/MS is the method of choice. 

3X\ LC-MS/MS 

The coupling of an MS to an HPLC system using a 
OJ mm diameter or bigger reverse phase (RF) column 
has been described in detail (42]. This system has several 
advantages if- a large number of samples are to be ana- 
lyzed and ail are available in sufficient quantity, the 
LC-MS and database searching program can be run in a 
fully automated mode using an autosampler, thus maxi- 
mizing sample throughput and rninimizing the need for 
operator interference. The relatively large column is 
tolerant of high levels of impurities from either gel prep- 
aration or sample matrix. Lastly, if configured with a 
flow-splitter and micro-sprayer [40], analyses can be per- 
formed on a small fraction of the sample (loss than 5%) 
while the remainder of the sample is recovered In very 
pure solvents. This latter feature is particularly useful 
when an orthogonal technique is also used to analyze 
peptide tractions, such as scintillation of an introduced 
radiolabel, and this data can be correlated with peptides 
identified by CID spectra. 

3J.2 Capillary LC-MS 

An increase of sensitivity of approximately tenfold can be 
achieved by using a capillary LC system with a 1(H) um ID 
column rather than a 0.5 mm ID column as referred to 
above. Since very low flow rates axe required for such 
columns, most reports have used a precolumo flow split- 
clog system for producing solvent gradients. We have 
recently desribed the design and construction of a novel 
gradient mixing system which enables .the formation 
of reproducible gradients at very low flow rates (low 
nL/min) without the need for flow splitting (A. Ducret 
el a/., submitted for publication). Using this capillary 
LC-MS/MS system we were able to identify gel-separat- 
ed proteins if low picomole to high femtomole amounts 
were loaded onto the gel [40]. This system is as yet not 
automated and, like all capillary LC systems, is prone to 
blockage of the columns by mJcroparticulates when ana* 
lyzing gel-separated proteins. 

3J J CE-MS/MS 

The highest level of sensitivity for analyzing gel-sep- 
arated proteins can be achieved by using capillary elec- 
trophoresis - mass spectrometry (CE-MS). We have de- 
scribed in the past a solid-phase extraction capillary elec- 
trophoresis (SPE-CB) system which was used with triple 
quadrupole and ion trap ESI-MS/MS systems for the 
identification of proteins at the tow femtomole to sub- 
feratomoie sensitivity level [43, 44]. While this system is 
highly sensitive, its operation is labor-hitensive and its 
operation has not been automated. In order to devise an 
analytical system with both the sensitivity of a CE and • 
the level of automation of LC, we have constructed 
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microfabricatcd devices for the introduction of samples 
into ES1-MS for high-sensitivity peptide analysis. 

The basic device is a piece of glass into which channels 
of 10-30 um in depth and 50-70 urn in diameter are 
etched by using photolithography/etching techniques 
similar to the ones used in tho semiconductor industry. 
(A simple device is shown in Fig. 3). The channels are 
connected to an external high voltage power supply [45]. 
Samples are manipulated on the device and off the 
device to tbe MS by applying different potentials to the 
reservoirs. This creates a solvent flow by electroosmotic 
pumping which can be redirected by changing the posi- 
tion of the electrode. Therefore, without the need for 
valves or gates and without any external pumping, the 
flow can be redirected by simply switching the position 
of the electrodes on the device". The direction and rate of 
the flow can be modulated by tbe size and the polarity 
of the electric field applied and also by the charge state 
of the surface. 

The type of data generated by the system is illustrated in 
Pig. 4, which shows the mass spectrum of a peptide sample 
representing the tryptic digest of carbonic anhydrase at 
290 fmol/uL- Each numbered peak indicates a peptide suc- 
cessfully identified as being derived from carbonic an- 



hydrase. Some of the unassigned signals may be chemical 
or peptide contaminants. The MS is programmed to auto- 
matically select each peak and subject the peptide to CID. 
The resulting CID spectra are then used to identify the 
protein by correlation with sequence databases. Therefore, * 
this system allows us to concurrently apply a number of 
protein digests onto the device, to sequentially mobilize 
tbe samples, to automatically generate CID spectra of 
selected peptide ions and to search sequence databases 
for protein identification. These steps are performed auto* 
rnatically without the need for user input and proteins can 
be identified at very low femtomole level sensitivity at a 
rate of approximately one protein per 15 min. 

3.4 Assessment of 2-DE-MS proteome technology 

Using a combination of the analytical techniques de- 
scribed above we have identified the 80 protein spots 
indicated in Fig. 5. The protein pattern was generated by 
separating a total of 40 microgram of protein contained 
in a total cell lysate of the yeast strain YFH499 by high 
resolution 2-DE and silver staining of the separated pro- 
teins. Tb estimate how far this type of proteome analysis 
can penetrate towards the identification of low abun- 
dance proteins, we have calculated the codon bias of the 
genes encoding tbe respective proteins. Codon bias is a 
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calculated measure of the degree of redundancy of trip- 
let DHA codons used to produce each amino acid in a 
particular gene sequence. It has been shown to be a 
useful indicator of the level of the protein product of a 
particular gene sequence present in a cell (46). The gen* 
eral rule which applies is that the higher the value of the 
codon bias calculated for a gene, the more abundant the 
protein product of that gene becomes. The calculated 
codon bias values corresponding to the proteins identi- 
fied in Fig. 5 are shown in Fig- 6b. Nearly all of the pro- 
teins identified (> 95%) have codon bias values of > 02, 
indicating they are highly abundant in cells. In contrast, 
codon bias values calculated for the entire yeast genome 
(Fig. 6a) show that the majority of proteins present in 
the proteome have a codon bias of <0.2 and are thus of 
low abundance. 

This finding is of considerable importance in our assess- 
ment of the current status of proteome analysis technol- 
ogy. It is clear that cvea using highly sensitive analytical 
techniques, we are only able to visualize and identify the 



more abundant proteins. Since many important regula- 
tory proteins are present only at low abundance, these 
would not be amenable to analysis using such tech- 
niques. This situation would be exacerbated in the anal- 
ysis of proteomes containing many more proteins than 
the approximately 6000 gene products' present In yeast 
cells [16]. In the analysis of, for example, the proteome 
of any human ceils; there are. potentially 50000-100000 
gene products [47). Inherent limitations on the amount 
of protein that can be loaded on 2-DE, and the number 
of components that can be resolved, indicate that only 
the most highly abundant fraction of the many gene 
products could be successfully analyzed. One approach 
that has been employed to circumvent these limitations 
is the use of very narrow range immobilized pH gradient 
strips, for the first-dimension separation of 2-DE (48). 
Since only those proteins which focus within the narrow 
range will enter the second dimension of separation, a 
much higher sample loading within the desired range is 
possible. This, in turn, can lead to the visualization and 
identification of less abundant proteins. 
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4 Utility of proteome analysis foe biological 
research 

For the success of proteomics as .a. mainstream approach 
to the analysis of biological systems it is essential to 
define how proteome analysis and biological research 
projects intersect. Without a clear plan for the implemen- 
tation of proteome-type approaches into biological re- 
search projects the full impact of the technology can not 
be realized. The literature indicates that proteome anal- 
ysis is used both as a database/data archive, and as a bio- 
logical assay or biological research tool. 

4,1 The proteome as a database 

The use of proteomics as a database or data archive 
essentially entails an attempt to identify all the proteins 
In a cell or species and to annotate each protein with the 
known biological information that is relevant for each 
protein. The level of annotation can, of course, be exten- 
sive. The most common implementation of this idea is 
the separation of proteins .by high resolution 2-DE, the 
identification of each detected protein spot and ' the 
annotation of the protein spots in a 2-DE gel database 
format. This approach is complicated by the fact that it is 
difficult to precisely define a proteome and to decide 
which proteome should be represented in tbe database. 
In contrast to the genome of a species, which is essen- 
tially static, the proteome is highly dynamic. Processes 
such as differentiation, ceil activation and disease can all 
significantly change the proteome of a species. This is 
illustrated in Fig. 7. The figure shows two high-resolu- 
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tion 2-DE maps of proteins isolated from rat serum. 
Fig. 7A is from the serum of normal rats, while Fig. 7B 
is from the serum of cats in acute-phase serum after 
prior treatment .with an inftainmation-caustng agent [49]. 
It is obvious that the protein patterns are significantly 
different in several areas, raising the question of exactly 
which proteome is being described. 

Therefore, a comprehensive proteome database of a spe- 
cies or cell type needs to contain all of the parameters 
which describe the state and the type of the cells from 
which the proteins were extracted as well as tbe software 
tools to search the database with queries which reflect 
the dynamics of biological systems. A comprehensive 
proteome database should be capable of quantitatively 
describing the fate of each protein if specific system! 
and pathways are activated in the cell. Specifically, the 
quantity, the degree of modification, the subcellular loca- 
tion and tbe nature of molecules specifically interacting 
with a protein as well as the rate of change of these 
variables should be described. Using these admittedly 
stringent criteria, there is currently no comlete proteome 
database. A number of such databases are, however, in 
the process of being constructed. The most advanced 
among them, in our opinion, are the yeast protein data- 
base YFD (50] (accessible at hUp^/www^pdcom) and 
the human 2D-PAGB databases of the Danish Centre 
for Human Oenome Research [12 J (accessible at http:// 
biobase.dk/cgi-bin/celis). While neither can be con- 
sidered complete as not all of the potential gene pro- 
ducts are identified, both contain extensive annotation 
of supplemental information for many of the spots 
which are positively identified in reference samples. 

4.2 The proteome as a biological assay 

The use of proteome analysis as a biological- assay or 
research tool represents an alternative approach to inte- 
grating biology with proteomics. To investigate the state 
of a system, samples are subjected to a specific proceess 
that allows the quantitative or qoaiitalire measurement 
of some of the variables which describe the system. In 
typical biochemical assays one variable enzyme 
activity) of a single component (e.g., a particular en- 
zyme) is measured* Using proteomics as an assay; mul- 
tiple variables expression level, rate of synthesis, 
phosphorylation state, etc,) are measured concurrently 
on many (ideally all) of the proteins in a sample. The 
use of proteomics as an assay is a less far-reaching prop- 
osition than the construction of a comprehensive pro* 
leome database. It does, -however, represent a pragmatic 
approach which can bo adapted to investigate specific 
systems and pathways, as long as the interpretation of 
the results takes into account that with current technol- 
ogy not all of the variables which describe the system 
can be observed (see Section 3.4). 

A common implementation of proteome analysts as a 
biological assay is when a 2-DB protein pattern gener- . 
ated from the analysis of an experimental sample is 
compared to an array of reference patterns representing 
different states of the system, under investigation. The 
state of the experimental system at the time the sample 
was generated is therefore determined by the quantita- 
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tive comparative analysis of hundreds to a few thousand 
proteins. Comparative analysis of the 2-DE patterns fur- 
thermore highlights quantitative and qualitative differ- 
ences in the protein profiles which correlate with the 
st#te of the system. For this type of analysis it is not 
essential that all the proteins are identified or even visu- 
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alized, although the results become more informative as 
more proteins are compared. It is obvious, however, that 
the possibility to identify any protein deemed character- 
istic for a particular state dramatically enhances this 
approach by opening up new avenues for experimeata- 
tion. 
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Figure I High resolution 2-DE map of proteins isolated from rat scrum with or without prior exposure lo an inflam- 
mation-causing agent. (A) normal rat serum, (B) scute-pbase scrum from rats which had previously been exposed to 
an innammation-causing agent. The first dimension of separation is an IPO from pH 4-10, and the second dimen- 
sion is a 7.5-17.5%T gradient SDS-PAOE gel. Proteins were visualized by staining with araido black. Further details 
of experimental procedures are included in (M, 49). 
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Proteose analysis as a biological assay has teen success- 
fully used in the Held of toxicology, to characterize 
disease states or to study differential activation of cells. 
The approach is limited, of course, by the fact thai only 
(he visible protein spots are included in the assay { and i( 
ts well known that a substantial but far from complete 
fraction of cellular proteins are detected if a total cell 
lysate is separated by 2-DE. Proteins may not be 
detected in 2-DB gels because they are not abundant 
enough to be visualized by the detection method used, 
because they do not migrate within the .boundaries (size, 
pi) resolved by the gel, because they are not soluble 
under the conditions used, or for other reasons. 

A different way to use proteome analysis as a biological 
assay to define the state of a biological system is to take 
advantage of the wealth of information contained in 
2-DE protein patterns, 2-DE is referred to as two-dimen- 
sional because of the electrophoretic mobility and the 
isoelectric points which define the position of each pro- 
tein in a 2-DB pattern. In addition to the two dimen- 
sions used to generate the protein patterns, a number of 
additional data dimensions are contained in the protein 
patterns. Some of these dimensions such as protein 
expression level, phosphorylation state, subcellular loca- 
tion, association with other proteins, rate of synthesis or 
degradation indicate the activity state of a protein or a 
biological system. Comparative analysis of 2-DB protein 
patterns representing different states is therefore ideally 
suited for the detection, identification and analysis of 
suitable markers. Once again it must be emphasized that 
in this type of experiment only a traction of the cellular 
proteins is analyzed Since many regulatory proteins are 
of low abundance, this limitation is a concern, particu- 
larly in cases in which regulatory pathways are being 
investigated. 

5 Concluding remarks 

In this report we have addressed three main issues 
related to proteome analysis. First, we have discussed 
the rationale for studying protcomes. Second, we have 
assessed the technical feasibility of analyzing proteomes 
and described current proteome technology, and third, 
we have analyzed the utility of proteome analysis for bio- 
logical research. It is apparent that proteome analysis is 
an essential tool in the analysis of biological systems. 
The multi-level control of protein synthesis and degrada- 
tion In cells means that only the direct analysis of 
mature protein products can reveal their correct identi- 
ties, their relevant state of modification and/or associa- 
tion and their amounts. . Recently developed methods 
have enabled the identification of proteins at ever- 
increasing sensitivity levels and at a high level of auto- 
mation of the analytical' processes. A number of tech- 
nical challenges, however, remain. While it is currently 
possible to identity essentially any protein spots that can 
be visualized fay common staining methods, it is ap- 
parent that without prior enrichment only a relatively 
small and highly selected population of long-lived, 
highly expressed proteins is observed. There are many 
more proteins in a given cell which are not visualized by 
such methods. Frequently it is the low abundance pro- 
teins that execute key regulatory functions. 



We have outlined the two principal ways proteome anal- 
ysis is currently being used to intersect with biological 
research projects: the proteome as a database or data 
archive and proteome analysis, as a biological assay. Both 
approaches have in common that at present they are con- 
ceptually and technically limited. Current proteome data- 
bases typically are limited to one cell type and one state 
of a cell and therefore do not account for the dynamics 
of biological systems. The use of proteome analysis as a 
biological assay can provide a wealth of Information, but 
it is limited to the proteins detected and is therefore not 
truly proteome-wide. These limitations In proteomics are 
~to a large extent a reflection of the fact that proteins in 
their fully processed form cannot easily be amplified and 
are therefore difficult to isolate in amounts sufficientjbr 
analysis or experimentation. The fact that to datefno 
complete proteome has been described further attest} to 
these difficulties. With continued rapid progress in pro- 
tein analysis technology, however, we anticipate that the 
goal of complete proteome analysis will eventually 
become attainable. 
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Histopathology is insufficient to predict disease progression and clinical outcome in lung adeno- 
carcinoma. Here we show that gene-expression profiles based on microarray analysis can be 
used to predict patient survival in early-stage lung adenocarcinomas. Genes most related to sur- 
vival were identified with univariate Cox analysis. Using either two equivalent but independent 
training and testing sets, or leave-one-out' cross-validation analysis with all tumors, a risk index 
based on the top SO genes identified low-risk and high-risk stage I lung adenocarcinomas, which 
differed significantly with respect to survival. This risk index was then validated using an inde- 
pendent sample of lung adenocarcinomas that predicted high- and low-risk groups. This index 
included genes not previously associated with survival. The identification of a set of genes that 
predict survival in early-stage lung adenocarcinoma allows delineation of a high-risk group that 
may benefit from adjuvant therapy. 



Lung cancer remains the leading cause of cancer death in indus- 
trialized countries. Most patients with non-small cell lung can- 
cer (NSCLC) present with advanced disease, and despite recent 
advances in multi-modality therapy, the overall 10-year survival 
rate remains a dismal 8-10% l . However, a significant minority of 
patients (-25-30%) with NSCLC have stage I disease and receive 
surgical intervention alone. Although 35-50% of patients with 
stage I disease will relapse within 5 years 2-4 , it is not currently 
possible to identify specific high-risk patients. 

Adenocarcinoma is currently the predominant histological 
subtype of NSCLC (refs. 1,5,6). Although morphological assess- 
ment of lung carcinomas can roughly stratify patients, there is a 
need to identify patients at high risk for recurrent or metastatic 
disease. Preoperative variables that affect survival of patients 
with NSCLC have been identified 7 " 10 . Tumor size, vascular inva- 
sion, poor differentiation, high turrior-proliferative index and 
several genetic alterations, including K-ras (refs. 11,12) and p53 
(refs. 10,13) mutations, have prognostic significance. Multiple 
independently assessed genes or gene products have also been 
investigated to better predict patient prognosis in lung can- 
cer 14 " 18 . Technologies that simultaneously analyze the expression 
of thousands of genes 19 can be used to correlate gene-expression 
patterns with numerous clinical parameters— including patient 
outcome — to better predict tumor behavior in individual pa- 
tients 20 . Analyses of lung cancers using array technologies have 
identified subgroups of tumors that differ according to tumor 
type and histological subclasses and, to a lesser extent, survival 
among adenocarcinoma patients 21 ' 22 . Here we correlated gene- 
expression profiles with clinical outcome in a cohort of patients 
with lung adenocarcinoma and identified specific genes that 
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predict survival among patients with stage I disease. For further 
validation, we also show that the risk index predicted survival in 
an independent cohort of stage I lung adenocarcinomas. 

Hierarchical profile clustering yields three tumor subsets 

Using oligonucleotide arrays, we generated gene-expression pro- 
files for 86 primary lung adenocarcinomas, including 67 stage I 
and 19 stage III tumors, as well as 10 non-neoplastic lung sam- 
ples. Selected sample replicates showed high correlation among 
coefficients and reliable reproducibility. We determined tran- 
script abundance using a custom algorithm and the data set was 
trimmed of genes expressed at extremely low levels, that is, 
genes were excluded if the measure of their 75 th percentile value 
was less than 100. Although potentially resulting in the loss of 
some information, trimming in this manner decreased the possi- 
bility that the clustering algorithm would be strongly influenced 
by genes with little or no expression in these samples. 
Hierarchical clustering with the resulting 4,966 genes yielded 3 
clusters of tumors (Fig. 1). All 10 non-neoplastic samples clus- 
tered tightly together within Cluster 1 (data not shown). We ex- 
amined the relationships between cluster and patient and tumor 
characteristics (Fig. 1 and Supplementary Figure A online). There 
were associations between cluster and stage (? = 0.030) and be- 
tween cluster and differentiation (P = 0.01). Cluster 1 contained 
the greatest percentage (42.8%) of well differentiated tumors, 
followed by Cluster 2 (27%) and Cluster 3 (4.7%). Cluster 3 con- 
tained the highest percentage of both poorly differentiated 
(47.6%) and stage III tumors (42.8%), yet contained 3 (14.3%) 
moderately differentiated and 1 (5%) well differentiated stage I 
tumor. Notably, 11 stage I tumors were present in Cluster 3, sug- 
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gesting a common gene-expression profile for 
this subset of stage I and stage III tumors. 

For patients with stage I and stage III tumors, 
the average ages were 68.1 and 64.5 years and 
the percentage of smokers was 88.9% and 
89.5%, respectively. Marginally significant as- 
sociations between cluster and smoking his- 
tory were observed (P = 0.06). A significant 
relationship between histopathological classifi- 
cation and cluster was only discernable for 
bronchioloalveolar adenocarcinomas (BAs), 
which were only present in Clusters 1 and 2 
(? = 0.0055) and comprised 35.7% and 12.3% 
of tumors for Clusters 1 and 2, respectively. 

We examined the heterogeneity in gene-ex- 
pression profiles based on the trimmed data set among normal 
lung samples and stage I and stage HI adenocarcinomas by calcu- 
lating correlation coefficients between all pairs of samples. In 
, contrast to normal lung samples that displayed highly similar 
gene-expression profiles (median correlation, 0.9), both stage I 
and HI lung tumors demonstrated much greater heterogeneity in 
their expression profiles with lower correlation coefficients (me- 
dian values, 0.82 and 0.79, respectively). 

Northern-blot and Immunohlstochemistry analyses 

Of the 4,966 genes examined, 967 differed significantly between 
stage I and HI adenocarcinomas, a number in excess of that ex- 
pected by chance alone (248 at alpha level (a) = 0.05). Three 
genes were arbitrarily selected to verify the microarray expression 
data. The mRNA from 20 of the normal lung and tumor samples 
was examined by northern-blot hybridization with probes for in- 
sulin-like growth factor-binding protein 3 (IGFBP3), cystatin C 
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21 65 86 

Fig. 1 Unsupervised classification analysis of lung adenocarcinomas. 3 classes of tumors identi- 
fied by agglomerative hierarchical clustering of gene-expression profiles using the 4,966 expressed 
genes. Patient and histopathological information for each lung adenocarcinoma case by cluster 
designation and methods for K-ras 12/1 3th-codon mutational status and nuclear p53 protein ac- 
cumulation are provided (Supplementary Figure A online). TN classification denotes information 
regarding patient tumor size and nodal involvement. Associations between cluster membership 
and patient or histopathological variables are indicated at significance level (PS 0.05). 



and lactate dehydrogenase A (LDH-A) (Fig. 2a). Two gene probes 
not represented on the microarrays were used as controls, includ- 
ing histone H4, a potential index of overall cell proliferation, and 
28S ribosomal RNA, a control for sample loading and transfer. 
The relative amounts of IGFBP3, cystatin C and LDH-A mRNA 
strongly correlated with microarray-based measurements (Fig. 
2b). In both assays, IGFBP3 and LDH-A mRNA levels increased 
from stage I to stage III adenocarcinomas and were higher than 
those in normal lung. Cystatin C mRNA levels were more variable 
but relatively greater in normal lung than tumors. These results 
suggest that the oligonucleotide microarrays provided reliable 
measures of gene expression. The tumors showed slightly greater 
histone H4 expression than the normal lung, likely reflecting in- 
creased proliferation of tumor cells. 

Immunohistochemistry was performed for IGFBP3, cystatin C 
and HSP-70 to determine whether mRNA overexpression was re- 
flected by an increase of their corresponding proteins in tumors. 
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Fig. 2 Validation analyses of gene-expres- 
sion profiling, o, Northern-blot analysis of 
selected candidate genes for verification of 
data obtained from oligonucleotide arrays. 
The same sample RNA for the 4 uninvolved 
lung, 8 stage I and 8 stage III tumors was 

used for the northern-blot and oligonucleotide array analyses. 
b, Correlation analysis of quantitative data obtained from oligonucleotide 
arrays and northern blots measured by integrated phosphorimager-based 
signals for the IGFBP3 and LDH-A genes. The ratio of ICFBP3, cystatin C 
and LDH-A mRNA to 28S rRNA was determined. The relative values for 
each gene from each sample are shown, n, non-neoplastic normal lung; 
1, stage I tumors; 3, stage III tumors, c, Immunohistochemical analysis of 
IGFBP-3, HSP-70 and cystatin C in lung and lung adenocarcinomas. 
Cytoplasmic IGFBP-3 immunoreactivity in a neoplastic gland (tumor 122) 
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IGFBP3 



HSP-70 




Cystatin C 



Cystatin C 



with prominent apical staining (blue reactant staining, arrow, upper left). 
Diffuse cytoplasmic HSP-70 immunoreactivity (tumor L27), yet stromal el- 
ements show no reactivity (upper right). Normal lung parenchyma (lower 
left) shows cytoplasmic cystatin C immunoreactivity in alveolar pneumo- 
cytes (arrow) and intra-alveolar macrophages but tumor (L90) shows dif- 
fuse cytoplasmic cystatin C immunoreactivity with prominent apical 
staining (lower right). Magnification, x200 



817 



Articles 



a 

s 

o 

c 



3 

£ 

(0 
Z 

cs 
o 
o 

Ol 



m 




(i 20 40 60 60 t00 



Time to death (months) 

Immunoreactivity for both IGFBP-3 and HSP-70 (Fig. 2c) was de- 
tected in the cytoplasm of the adenocarcinomas, with little de- 
tectable reactivity in .the stromal or inflammatory cells. Cystatin 
C was detected in alveolar pneumocytes and intra-alveoiar 
macrophages in non-neoplastic lung parenchyma and also con- 
sistently in the cytoplasm of neoplastic cells. 

Cene-expression profiles predict survival 

As expected, Kaplan-Meier survival curves (Fig. 3a) and log-rank 
tests indicated poorer survival among stage III compared with 
stage I adenocarcinomas (P - <0.0001). Two statistical ap- 
proaches were used to determine whether gene-expression pro- 
files could predict survival using the data set of 4,966 genes. In 
one approach, equal numbers of randomly assigned stage I and 
stage III tumors constituted training (n = 43) and testing (n = 43) 
sets. In the training set, the top 10, 20, 50 or 75 genes were used 
to create risk indices that were evaluated for their association 
with survival using the 50th, 60th or 70th percentile cutoff 
points to categorize patients into high or low groups. The results 
were similar across cutoff points but the 50-gene risk index had 
the best overall association with survival in the training set. 



Fig. 3 Gene-expression profiles and patient survival, a, Relationship be- 
tween tumor stage and patient survival (stage 1 and stage 3 differ signifi- 
cantly, P< 0.0001). b t Relationship between the survival in the 43 test 
samples and their risk assignments based on the 50-gene risk index esti- 
mated in the 43 training samples. The high- and low-risk groups differ sig- 
nificantly (/*= 0.024). c, Relationship between patient survival and the risk 
assignments in test samples (in b) conditional for tumor stage. The highl- 
and low-risk stage I groups differ significantly (P = 0.028), whereas stage III 
low- and high-risk groups did not (P= 0.634). d, Relationship between sur- 
vival in the test cases and their risk assignments based on the 86 'leave-one- 
out' cross-validation of the 50-gene risk index. The high- and low-risk 
groups differ significantly (P = 0.0006). e, Relationship between test case's 
risk assignment and survival (in d) conditional on tumor stage. The high- 
and low-risk stage I lung adenocarcinoma groups differ significantly from 
each other (P = 0^003), whereas low- and high-risk stage III tumors do not. 
f, Relationship between tumor class identified by hierarchical clustering and 
patient survival. Survival for patients in Cluster 3 differed relative to the tu- 
mors in Cluster 2 (P= 0,037) and approached significance for Cluster 1 and 
2 combined (P = 0.06). g, Analysis of the Michigan-based risk index using 
top cross-validated survival genes identify a low- and high-risk group in an 
independent cohort of 84 Massachusetts-based lung adenocarcinomas that 
are significantly different (P = 0.003). h, Among the 62 stage I lung adeno- 
carcinomas in the Massachusetts sample, the high- and low-risk groups dif- 
fered significantly (P = 0.006). 



After conservatively choosing the 60th percentile cutoff point 
from the training set, we then applied this risk index and cutoff 
point to the testing set. The risk index of the top 50 genes cor- 
rectly identified low- and high-risk individuals within the inde- 
pendent testing set (P = 0.024) (Fig. 3b and Supplementary 
Methods online). Notably, 11 stage I tumors were included in 
the high-risk subgroup. When this risk assignment was then 
conditionally examined for stage progression (Fig. 3c), low- and 
high-risk groups among stage I tumors were found to differ (P = 
0.028) in their survival. 

Identification of a robust set of survival genes 

Although predictive of patient survival, a single training-testing 
set may not provide the most robust set of genes due to random 
sampling issues. Therefore, a 'leave-one-out' cross-validation ap- 
proach was used to identify genes associated with survival from 
all 86-tumor samples. We first developed a 50-gene risk index in 
each training set, and then applied the risk index to the test case 
held out from the full set of tumors and assigned the held out 
tumor to the high- or low-risk groups (Fig. 3d). The high and 
low-risk subgroups determined in the test cases differed signifi- 
cantly in their overall survival (P = 0.0006). Among the larger 
group of stage I lung adenocarcinomas, the low-risk (n = 46) and 
high-risk (n - 21) groups had markedly different survival (P = 
0.003) (Fig. 3e). Table 1 lists selected examples of the cumulative 
top 100 genes derived from this cross-validation procedure 
(complete list in Supplementary Table A online). 

It was also noted that many of the stage I patients in the high- 
risk subgroup (Fig. 3e) were present in Cluster 3 (Fig. 1). 
Kaplan-Meier analysis (Fig. 3f) demonstrated a significantly 
worse survival (P= 0.037) for patients in Cluster 3 relative to pa- 
tients in Cluster 2 and approaching significance for Cluster 1 
and 2 combined (P = 0.06). This further indicates the important 
relationship between gene-expression profiles and patient sur- 
vival, independent of disease stage. 

Consistent with previous analyses of lung adenocarcinomas 23 , 
40% of stage I and 57.8% of stage III tumors had 12th or 13th 
codon K-ras gene mutations. Those patients with tumors con- 
taining K~ras mutations showed a trend of poorer survival, but 
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Bolded genes were also significant for survival in 43 tumor training set (Fig. 3b). 



Table 1 Selected examples of the cumulative top 1 00 genes identified using 
training-testing, cross-validation of all 86 lung tumor samples. The percent 
change, as well as the direction, for the average values of the 1 0 non-neoplastic 
lung to all tumors, and for the 67 stage I to the 1 9 stage III tumors are shown. A 
positive coefficient {5 value is indicative of a relationship of gene expression to a 



poorer patient outcome. The genes are listed in potential functional categories. 
Genes that were also present in the top 50 survival genes using the 43-tumor 
training set (Fig. 3d) are indicated in bold type. Complete listing of the gene 
probe sets and annotated gene and unigene identifiers can be found in the 
Supplementary Methods. 
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Fig. 4 Gene expression patterns of top survival genes a, Gene-expression patterns de- 
termined using aggtomerative hierarchical clustering of the 86 lung adenocarcinomas 
against the 1 00 survival-related genes (Table 1 ) identified by the training-testing, cross- 
validation analysis. Substantially elevated (red) or decreased (green) expression of the 
genes is observed in individual tumors. Some tumors (black arrow and expanded area) 
show extremely elevated expression of specific genes, b, An outlier gene-expression pat- 
tern (>5 times the interquartile range among all samples) is observed for the erbE2 and 
Reg\ A genes (top left and right, respectively). The SI 00? and crk genes (bottom left and 
right, respectively) show a graded pattern of expression related to patient survival. O, 
alive; •, dead (also in c). c, The number of outliers per person identified in the top 1 00 
genes plotted by survival distribution. 
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this difference did not reach statistical significance among all 
patients (P = 0.25), between patients within tumor clusters (P = 
0.41) or when analyzed separately among stage I (? = 0.22) and 
stage III (P = 0.53) patients. Nuclear accumulation of p53 was de- 
tected in 17.9% stage I and in 22.2% stage III tumors. No signifi- 
cant relationship was observed for p53 staining and patient 
survival, cluster or tumor stage. 

Confirmation using an independent set of adenocarcinomas 

The robustness of our 50-gene risk index in predicting survival in 
lung adenocarcinomas was tested using oligonucleotide gene-ex- 
pression data .obtained from^ a completely independent 
(Massachusetts-based) sample, of 84 lung adenocarcinomas (62 
stage I, 14 stage II and 8 stage III; ref. 21, and dataset A at 
www.genome.wi.mit.edu/MPR/lung). To ensure equivalent 
power for testing and comparability of samples, the criteria for 
including tumors in the analysis were 40% or greater tumor cellu- 
larity, no mixed histology (that is, adenosquamous) and patient 
survival information. To obtain comparative gene-expression 
measures between the two data sets, gene sequences present on 
the U95A and HuGeneFL array were examined, and expression 
data for our top 50 cross-validation genes for all 84 Massachusetts 
samples were obtained and processed" (see also Supplementary 
Methods online ). When we examined the risk assignment of 
these 84 samples, employing the identical cutoff point used for 
the 86 Michigan-based lung samples, we observed low- and high- 
risk groups (Fig. 3g; P = 0.003). Notably, among the 62 stage I tu- 
mors, high- and low-risk groups were observed that differed 
significantly (P- 0.006) in their survival (Fig. 3h). 

Survival genes had graded and outlier expression patterns 

A statistical and graphical analysis of the 100 survival-related 



genes (Table 1) clustered against all 86 tumors revealed individ- 
ual tumors with substantially elevated expression in both a lim- 
ited and larger number of genes (Fig. 4a). Among these genes, we 
observed two distinct patterns of expression related to patient 
survival. One pattern, designated 'outlier', included genes show- 
ing substantially elevated expression (greater than five times the 
interquartile range among all samples), whereas the other pat- 
tern, designated 'graded', was characterized by continuously dis- 
tributed expression with patient survival (Fig. 4b). The erbBZ and 
ReglA genes are examples of outlier expression patterns and 
51 OOP and crk genes of graded patterns. The number of outliers 
per person in the top 100 genes was identified and plotted ac- 
cording to survival times and events (Fig. 4c). Both stage I and 
stage III lung adenocarcinomas showed outlier gene patterns 
and 10 tumors contained 3 or more outlier genes. 

Because gene amplification may result in increased gene ex- 
pression, the nine genes with outlier expression patterns (erbhl, 
SLC1A6, Wnt 1, MGB1, ReglA, AKAP12, PACE, CYP24, KYNU) 
and one gene with a graded expression pattern (KRTIS) were ex- 
amined using quantitative genomic PCR to evaluate genomic 
copy number (Fig. 5a). Gene amplification of erbhl (17ql2) was 
detected in tumor L94, which had the highest erbBZ mRNA ex- 
pression (Fig. 4a). Gene amplification was not detected for any 
of the other seven tested genes in tumor L94, as well as in other 
tumors. The two genes most frequently demonstrating the out- 
lier pattern in these lung adenocarcinomas were KYNU and 
CKP24, and were present in 10 and 9 tumors, respectively. 
CYP24 has been described as a gene amplified and overexpressed 
in breast cancer 25 , and these results indicate elevated expression 
in lung adenocarcinoma. 

To determine whether the graded or outlier gene-expression 
patterns also occur at the protein-expression level, 10 of the 100 
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Fig. 5 Gene amplification and protein expression of survival-related genes, 
o, Analysis of potential gene amplification for 9 genes showing outlier expres- 
sion patterns in the lung tumors (er6B2, SLC1 A6, Wnt 1, MGB1, Reg^A, 
AKAPM, PACE, CYP24 and KYNU) and examined using quantitative genomic 
PCR. A gene showing graded expression pattern (KRTiS), and one gene 
(PACE4) with a similar chromosome location as PACE, were used as controls. 
Only erbB2 and ReglA are shown. An esophageal adenocarcinoma with 
known high-level genomic amplification of erbB2 was used as a positive con- 
trol and normal esophagus DNA was used as a negative control (Ctl). PCR 
fragments sizes were 343 bp for CAPDH, 1 66 bp for erb&2 and 1 26 bp for 



Reg^ A. DNA is from normal lung (N) and tumorfj) from each patient (for ex- 
ample 117). b, Immunohistochemical analysis of survival related genes with 
lung adenocarcinoma microarrays using the tumors from this study. The 
transmembrane erbB2 protein (top left) expression is substantially increased 
in tumor L94 containing the amplified erbB2 gene (Fig. 4a and b). Expression 
of VECF (top right) and S100P (bottom left) was located within the neoplas- 
tic cells and the pattern of immunoreactivity was consistent with the graded 
expression pattern demonstrated by their mRNA profiles. Expression of the 
oncogene crk (bottom right) was abundantly expressed in neoplastic lung 
cells. Magnification, x400 (erbB2); x200 (VECF, SI OOP and crk). 
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top. survival genes (Table 1) for which specific antibodies were 
available were chosen for immunohistochemical analysis using 
lung-tumor arrays from this study (Fig. Sb). Expression of mem- 
brane erbB2 protein was substantially increased in the erbBZ-am- 
plified tumor L94 and very low levels of expression were present 
in other tumors, consistent with mRNA-expression measure- 
ments (Fig. 4a and b). CDC6 protein expression was also sub- 
stantially higher in tumor L94, consistent with mRNA levels 
(data not shown). Expression of vascular endothelial growth fac- 
tor (VEGF) and S100P (Fig. Sb), as well as cytokeratin 18 (KRT18), 
cytokeratin 7 (KRT7) and fas-associated death domain (FADD) 
protein (data not shown), was located within the lung tumor 
cells and consistent with the graded expression pattern of the 
mRNA profiles. The oncogene crk showed both graded mRNA as 
well 'as a graded protein-expression pattenrwith' survivalrand 
was abundantly expressed in the tumor cells (Fig, Sb), These re- 
sults indicate that many survival-associated genes are expressed 
at the protein level and demonstrate similar mRNA and protein- 
expression patterns. 

Discussion 

We used several approaches for the analysis of gene-expression 
data related to clinicopathological variables and patient sur- 
vival. One approach, hierarchical clustering, was used to exam- 
ine similarities among lung adenocarcinomas in their patterns 
of gene expression. Previous studies of lung tumors 21,22 have also 
used this method to describe subclasses of lung tumors. Here, 
we found three clusters that showed significant differences with 
respect to tumor stage and tumor differentiation. This suggests, 
as expected, that tumors with similar histological features of 
differentiation demonstrate similarities in gene expression. 
This feature also partly underlies the observed statistical associ- 
ation of tumor stage and cluster, as many of the higher-stage tu- 
mors, often poorly differentiated and previously associated 
with a reduced survival 910 , were located in Cluster 3. Although 
this cluster contained the highest percentage of stage III tu- 
mors, it also contained a nearly equal mixture of stage I and 
stage III tumors and not all tumors were poorly differentiated. 
This indicates that a subset of stage I lung adenocarcinomas 
share gene-expression profiles with higher-stage tumors. 
Notably, 10 of the 1 1 stage I tumors found in Cluster 3 were the 
high-risk stage I tumors identified using the risk index in the 
'leave-one-out' cross-validation. 

In contrast to previous analyses of lung adenocarcinomas 21,22 , 
we validated the expression data from the arrays. The strong cor- 
relation of northern-blot analysis and oligonucleotide-array data 
for gene expression in the same samples (Fig. 2b) indicates that 
these studies provide robust gene-expression estimates. 
Immunohistochemistry using the same tumor samples in tissue 
arrays demonstrates protein expression within the lung tumor 
cells. Together, these studies indicate that many of the genes 
identified using gene-expression profiles are likely relevant to 
lung adenocarcinoma. For example, IGFBP3 gene expression is 
increased in lung adenocarcinomas (Fig. 2c). IGFBP3 protein 
modulates the autocrine or paracrine effects of insulin-like 
growth factors, elevated IGFBP3 expression is observed in colon 
cancer 26 , and increased serum IGFBP3 is associated with progres- 
sion in breast cancer 27 . Heat-shock protein 70 (HSP-70) is in- 
creased in lung adenocarcinomas of smokers 28 and is associated 
with increased metastatic potential in breast cancer 29 . Increased 
serum lactate dehydrogenase is correlated with tumor stage and 
tumor burden 30 , and cystatin C, a cysteine protease inhibitor ex- 



pressed in human lung cancers 31 , is prognostic in some cancers 32 . 
The decreased expression of this protease inhibitor may affect 
the invasive properties of the tumor cell. 

The cross-validation analytical strategy we used is particularly 
informative for these types of gene-expression analyses for dis- 
ease outcome 33,34 , and identification of cross-validated genes with 
a larger tumor cohort may help refine this risk index for use in a 
clinical setting. The gene-expression data also provide opportuni- 
ties to observe overarching patterns that advance our under- 
standing of associations between genes and disease. For example, 
the top 100 survival genes include those involved in signaling, 
cell cycle and growth, transcription, translation and metabolism. 
Expression of many of these genes is likely a function of increased 
proliferation and metabolism in the more aggressive tumors. 
Some genes, such as erbBl and ReglA (Fig. 4a and b), were highly 
overexpressed in a few patients having poor survival. In one 
tumor, the erbb2 gene was amplified (Fig. 5a), demonstrating that 
genomic changes may underlie the overexpression of a subset of 
these outlier genes. Immunohistochemistry confirmed protein 
overexpression in this patient's tumor (Fig. Sb). Notably, seven of 
the eight outlier genes were not amplified, indicating that other 
mechanisms underlie the increased mRNA expression of these 
survival-related genes. 

Most genes showed a graded relationship between expression 
and patient survival. Genes such as that encoding VEGF, known 
to be strongly associated with survival in lung cancer 35,36 were 
identified as related to patient survival in our study. VEGF 
demonstrated a graded expression pattern, as did the SI OOP and 
crk oncogene (Fig. Sb). S100P is a calcium-regulated protein not 
previously reported in lung cancer. The crk gene, the cellular ho- 
molog of the v-crk oncogene, is a member of a family of adaptor 
proteins involved in signal transduction and interacts directly 
with c-jun N-terminal kinase 1 (JNK1) 37 . Although crk has not 
been shown to have a role lung cancer, its role in the MAP-ki- 
nase pathway, which leads to activation of matrix metallopro- 
teinase secretion and cell invasion 38 , indicates potential 
involvement in the the tumor cell invasion or metastasis of 
some lung adenocarcinomas. Among the many genes identified 
in this study, like crk, that may be causally involved in lung can- 
cer progression (Table 1), some were related to survival in many 
patients, and others in only smaller subsets of patients. This re- 
sult is consistent with the complex molecular architecture of tu- 
mors in general, the heterogeneity of lung adenocarcinomas in 
particular and the multiple mechanisms underlying tumor-cell 
survival, invasion and metastasis 39 . 

Our results demonstrate that a gene-expression risk profile- 
based on the genes most associated with patient survival — can 
distinguish stage I lung adenocarcinomas and differentiate prog- 
noses. The particular genes that define the clusters, or are associ- 
ated with survival, likely reflect the characteristics of the 
particular tumors included in the analysis. Current therapy for 
patients with stage I disease usually consists of surgical resection 
without adjuvant treatment 2,3 . Clearly, the identification of a 
high-risk group among patients with stage I disease would lead 
to consideration of additional therapeutic intervention for this 
group, possibly leading to improved survival of these patients. 

Methods 

Patient population. Sequential patients seen at the University of Michigan 
Hospital between May 1 994 and July 2000 for stage I or stage III lung ade- 
nocarcinoma were evaluated for this study. Consent was received and the 
project was approved by the local Institutional Review Board. Primary tu- 
mors and adjacent non-neoplastic lung tissue were obtained at the time of 
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surgery. Peripheral" portions of resected lung carcinomas were sectioned, 
evaluated by a study pathologist and compared with routine H&E sections 
of the same tumors, and utilized for mRNA isolation. Regions chosen for 
analysis contained a tumor cellularity greater than 70%, no mixed histol- 
ogy, potential metastatic origin, extensive lymphocytic infiltration or fibro- 
sis. Tumors were histopathologically divided into two categories based on 
their growth pattern: bronchial-derived, if they exhibited invasive features 
with architectural destruction, and bronchioloalveolar, if they exhibited 
preservation of the lung architecture. All stage I patients received only sur- 
gical resection with intra-thoracic nodal sampling and no other treatments. 
Stage III patients received surgical resection plus chemotherapy and radio- 
therapy. 

Gene-expression profiling and K-rai mutation analysis. RNA isolation, 
cRNA synthesis and gene-expression profiling were performed as de- 
scribed 24 . Details of gene annotation and K-ras mutation analysis' are pro- 
vided in supplementary information. 

Northern-blot analysis. Total cellular RNA (1 0 u.g) was separated in 1 .2% 
agarose-formaldehyde gels and vacuum-transferred to Gene Screen Plus 
(NEN Life Science Products, Boston, Massachusetts). Hybridization condi- 
tions and probe labeling were as described 40 . Individual sequence-validated 
cDNA image clones for human IGFBP3 (clone 1407750), LDH-A (clone 
2420241), cystatin C (CTS3; clone 949938) were from Research Genetics 
(Huntsville, Alabama). The human histone H4 cDNA and the 28S ribosomal 
RNA 26-mer oligonucleotide probe were prepared and labeled as de- 
scribed 40 . 

Gene-amplification analysis. 1 1 genes were selected for the analysis of ge- 
nomic alterations. Primers were designed using PrimerSelect 4.05 Windows 
32 software (DNASTAR, Madison, Wisconsin), avoiding pseudogenes or po- 
tential homologous regions. Forward and reverse primers for the genes are 
provided (Supplementary Methods online). Quantitative genomic-PCR was 
then applied and analyzed as described 4 '. 

Immunohistochemical staining. The H&E-stained slides of all primary 
lung tumors were used to identify the most representative regions of each 
tumor and a tissue microarray (TMA) block was constructed as described 42 . 
Immunohistochemistry (IHC) was performed using both routine and sec- 
tions from the TMA block as described 24 . Detailed methods and the con- 
centrations used for all antibodies are provided in the Supplementary 
Methods. 

Statistical methods, t-tests were used to identify differences in mean gene- 
expression levels between comparison groups. Agglomerative hierarchical 
clustering 43 was applied using the average linkage method to investigate 
whether there was evidence for natural groupings of tumor samples based 
on correlations between gene-expression profiles. To investigate the ro- 
bustness of the clustering inference, gene-expression values were per- 
turbed by adding random Gaussian error of magnitude obtained from a 
duplicate sample to each data point and then reclustered to determine con- 
cordance in the tumor's class membership. Pearson, x 2 and Fisher's exact 
tests were used to assess whether cluster membership was associated with 
physical and genetic characteristics of the tumors. 

To determine whether gene-expression profiles were associated with 
variability in survival times, 2 separate but complementary approaches 
were used. In the first approach, the 86 tumors were randomly assigned to 
equivalent training and testing sets consisting of equal numbers of stage I 
and III tumors in order to validate a novel risk-index function that captured 
the effect of many genes at once. In the second approach, cross-validation 44 
was used to more robustly identify the genes associated with survival. 
Briefly, a 'leave-one-out' cross-validation procedure in which 85 of the 86 
tumors (the training set) was used to identify genes that were univariately 
associated with survival. The risk index was defined as a linear combination 
of the gene-expression values for the top genes identified by univariate Cox 
proportional-hazard regression modeling 45 , weighted by their estimated re- 
gression coefficients. Kaplan-Meier survival plots and log-rank tests were 
then used to assess whether the risk-index assignment to high/low cate- 
gories was validated in the test set A more detailed description is provided 
(Supplementary Methods online). 



Note: Supplementary information is available on the Nature Medicine website. 
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f ABSTRACT 

.., Retinoids regulate gene transcription through activating retinoic add 
receptors (RARsVreti»ok X rcctpton (RXRs). Of the three RAR rcoep. 
/'tow (a, ft and yX KAK/I haw been cmiwdcrcd a tumor suppressor gent. 

Here* we Identified a novel RAR0 lsoform-RAR05 In breast epItheHal 
^w-vdifc^ro^ Slmnar to 

^ RARB2, the first «u» (59 bp) of RAR/J5 b RAR05 Isoform specific 
Teresa the other exons are common to aB of (be RARp Isoforms. The 

• 'flrst exon of RAR05 does not contain any translation start codon, and 
: themftire Its protein translation begins at an Internal methionine cudon of 
1-RAR02, lacking the A, B, and part of C domain of RAR02. RAR0S 
> protein was pi^ereiitiany expressed In estrogen receptor-negatire breast 
: ^ eetig an d normal breast epithelial cells that arc relatively resistant 
:* to retinoids, whereas estrogen receptor-positive cells mat did not express 

< | e | ect ablc RAR05 protein were sensitive to retinoid treatment, suaaestlna; 
: : ^ 0^ teoforsn may affect the cellular response to retinoids. RAR0S 
? jsoform b* uniifuc among all of the RARs, because a corresponding isoform 
' was not detectable for either RARa or RAR 7. RAR0S mRNA was vari- 
'ably expressed tn normal and cancerous breast epithelial cells. Its tran- 
scription was under the control of a distinct promoter P3, which can be 
''activated by an-owswrtmolc add (aiRA) and other RAR/RXR seiectiva 
v -retinoids fat MCF-7 and T47D breast cancer cells. We mapped the RAR05 
promoter and found a region -3e2/-99 to be the large! region of atRA. fa 
condustoa, we identified and InitlaHy characterized BAR05 in normal, 
\ bremangnant, and malignant breast epithelial cells. RAR05 may serve as 
■ potential target of retinoids w prevention and therapy studies, 

* INTRODUCTION 

The biological effects of retinoids are mainly mediated by two 
^families of nuclear receptors: retinoic acid receptors (RARs) and 
! fetiiwic X receptors (RXRs), each cWirting of three receptor sub- 
" types' (a, 0, y. tefs - li 2 >* fa additioa * RAR getwales 
. mu it]p|e isoforms by either alternative splicing or di/fcrcntiHl usage of 
; - two prufnoters (1» 2). RARa/RXRs belong to the superfamily of 
\nuclcar receptors thai mediate the transcriptional effects of steroid 
^hormones, vitamin D, and thyroid hormone (3). RARs preferentially 
^dimerize with RXRs to form RAR-RXR hetcrodimcrs that are thought 
ftp be obligatory intermediates in the effects of RAR ligands on gene 
expression (4). RXRs also can homodimerize to form transcriptionally 
active complexes (5% Homo- and hcterodtirieric retinoid receptor 
/ complexes bind to distinct retinoid response elements embedded in 
Cthc regulatory regions of retinoid-responsivc genes (6). Although 
'there is considerable variability in the sequence and structure of the 
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retinoid response elements in retinoid-rcgulated genes, they conform 
to a general canonical sequence in which two directly repeated recep- 
tor-binding hexanucleotide motifs [coasensus (A/G)G(G/r)TCAJ are 
separated by a variable number of intervening nucleotides (6). 

RAR0 itself is a retinoid target gene and believed to play a role as 
a tumor suppressor gene in timuOTgenesis (7). The humah RAR^ gene 
was first idcnuficxl trom in; 1987 (8), fol- 

lowed by the identification of retinoic acid re^xmsc -clement (RARE) 
in its promoter region (9). In the mice, die RAR0 gene generates four 
distinct transcripts: splice variants RAR01 and RAR03 from tran- 
scription at promoter PI, and RAR02 and RAt^94 : Ckaia the RARB- 
conlaining P2 promoter (2, 10). In the human, only RAR/52 and 
RARJ34 transcripts have been identified in normal adult cells (U). 
Human RAR^Bl is expressed in fetal tiftsue* and aorne amall cell lung 
carcinoma cell lines (12); whereas a human hornologue of (He RAR03 
isoform has not been detected (7). The RAR02 and RAR04 tran- 
scripts differ only in the content of their 5'-mo8t exon, a result of 
alternaiive splicing (10). On the bask of homology with other mem- 
ben of the steroid hormone receptor supeifamily. six distinct domains 
(A-F) have been identified within RARs and RXRs (2), Thua far. all 
of me identified RAR isoforms are only different at their unique A 
domain and are derived from' two rjftymoters and albsrnativc splicing 
(11). Ittuforms of a given RAR-gena generally contain identical 
protein sequences B-F (1 1), 

We've been interested in RAR/RXR alteration in the process of breast 
tumor progression with a MCP10 model (13). During the character- 
ization oTRAR0 expression in the MCF10 series of cell lines (benign 
MCF10A, premalignant MCP10AT, and malignant MCFJOCAla cell 
lines; ref. 14), we identified a novel RAR0 iscibmu which wc muned 
RAR^5. RAR/3S mRNA expression is under the control of a distinct 
promoter P3 and is mediated by all-rnmswetinolc acid (arRA) and 
other RAR/RXR selective hgands to breast cancer cella. It was de- 
tected in both normal and breast cancer cells. We also cloned and 
initially characterized the promoter region of RAR/35. The same 
protein isoform was previously associated with RAR04 transcript (1 1 , 
15) and then termed RAR0' (7). In this study, we have identified 
RAR/35 at both gene and protein level. 

MATERIALS AND METHODS 

Cell Culture, the MCP10A ceQ line was received from Karrnanust Cancer 
Institute (Detroit, MI) and cultured as described previously (13). Normal 
human mammary epithelial cells (HMGC) were purchased from Oonencs 
(Santa Roea, CA) and cultured in MEGM with supplement* (Oonetics). 
MCP-7, T47D. and MDA-MB435 cell lines were purchased from American 
Type Tissue Collection (Manassas, VA). BCA-I to BCA-1 1 breast carcinoma 
cell hoes were from breast cancer patients and established in our laboratory 
(16). These cell lines arc still at their early patsuge* (passage number at 7-10), >' 
and their characteristic fcuutres are sununaroed tn Appendix I as supplememsl' 
material. All of these cell lines were cdtured in MEM supplemented with 100 
unils/mL penidliin, 100 ^gfml. streptomycin, and 10% fetal bovine serurb, 
200 umol/L L-grutaminc and 100 umol/L MEM imnessential amino acids. The 
atRA was purchased from Sigma (Sl Louis, MO). The 9^f-fetinoic acid 
(9-cijRA), 4^ydroayphenyl reunanude (4*KPRX and LOD1069 were nb- 
uiined from the repository of the National Cancer institute (Bethesda, MD). 
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Am80 (RARo/0 selective ligand) was a generous gift from Dr. Koiehi Shudo 
(ITSUU Uboratory, Tokyo, Japan), 

Rapid Amplification of cDNA S'-Ends and cDNA Cloning. RapM am- 
plification of cDNA 3'-cmfa (5'-RACE) was done with SMART RACK cDNA 
Amplification Kit <Cloniech Laboratories Inc., Palo Alto, CA) according to 
the User Manual RACE PCR was done with a Universal Primer and two 
RAR02 gene-specific reverse primers located at 1754 bp (RARJB-1754RP, 
CKlACTGTGCTCTCKJltjrGTTCCCACTT) and 1216 bp (RAR02-1216RP, 
GGTCrcCGATGOlX:AAGCCAGTOAA) 3* of the RAR02 transcriptioa 
site. PCR products were cloned into pCR4-TOPO vector (Invirrogen, Carlsbad, 
CA) and sequenced with ABJ PRISM 377 DNA Sequencer (Applied Biosys- 

tcms, Foster City, CA). 

Reverse Transcription-PCR. RT was done in a final volume of 20 pL 
with 2 fig total RNA and 100 units of MuLV reverse transcriptase (Cnvitrogen) 
ot 4rC for 30 minutes. Conventional PCR was mainly used to qualitatively 
detect gene expression. PCK was done with I uL RT product with PCR 
Supermix flwittonen) .-PGR -primer pairs arc RAR02 (475FP, GACTGTAT- 
GC^TOTTCTCTCAO; and 730RP, ATTTOTCCtCdCACiACGAACCA) 
and RARA5 (I4PP, CTGGAAGCTCOTACACAGTGA; and 343RP. GGA- 
CATTCOCACT1C-AAAGC). P-actin (FP. GTCACCA ACT0COACG AC A; 
and RP ltXICX^TCTCriX3C-TCGAA) was nscd as an internal control. 
Real-time PCR was done with 1 uX RT product with 7900HT Sequence 
Detection System (ABT, Applied Biosystems) and ABI 2 X SY6R Green PCR 
Master Mix (ABW 4309155) according tt> recommended guidelines of ABI. 
Primer pairs for real-time PCR were RAR02 (584FP, GAT1GACCCAAAC- 
CGAATCGCAGCA; and 730RP) and RAR05 (I5FP, GGAAGGT-OGTACA- 
CAGTGAATTTCTCTG AG ; and RAR02-73QRP); real-lime PCR data were 
analysed with a software package (ABI Prizm SDS2.1) provided with the 
instrumentation system. 

Expression Vector and in ritrv Translation, Trie RAR05 cxprcssitin 
vector was generated by PCR<lonmg with pcDNA3.1/V5-His TOPO TA 
expression Kit (lirvitrogen). The open reading frame (ORF) of RAR05 was 
belated by PCR of 5 '-RACE cDNA from MDA-MB435 cells with primers 
containing RAR05 start and wop codon (RAR05-start, C AG AAG AATAX- 
CATTTACACITGTCACCG; and RAR05-stop» GTCrMTTGCACGA-GT- 
GGTGACT0). RAR02 expression vector pTag (RARJ32 0'-; R AR02 with a 
mutation to knockout a downstream translation start site) was a generous gift 
from Dr. Karen Swisshclm (Department of . Pathology, University of Wash- 
ington, Seattle, WA; ref. i IX RAR02 insert was cut from pTag (RAR02 
and Hgated into the BamH) sites of pcDNA3.I vector (Invitxogen) to generate 
RAR02 expression vccuir-pcDNA3.1(RAR^2). The pcDNA3.l(RAR02) was 
used for born in vitro translation to generate RARJ32 protein as positive control 
for Western blot and cotransfection to test the effects of RAR02 expression on 
RAR05 promoter activity. In vitro translatioo was done with TNT Quick 
Coupled Translation kit (Promega, Madison, WI). 

Western Blot When cells grew to 50 to 70* omlluence, cell lysateswere 
prepared and subjected to Western Wot analysis as described previously (17). 
Two antibodies were used to detect RAR0 isoforms, one recognhing amino 
acids 430-447 in the COOH-tennmus of RAR02 (sc-552, Santa Cruz Biotech- 
oology Inc^ Santa Cruz, CA). the other one recognizing amino acids 407-423 
in the O)0H-termjnui of RAR02 (Geneka Biotechnology Inc., Montreal, 
Quebec) Another two antibodies recognizing COOH^ermirow of R ARa (sc- 
551, Saora Cn» Bmtedinoiogy Inc.) and RAR7 fsc-550. Santa Cruz Biotech- 
nology Inc.) were used to detect RARa and RAR7 isoforms. 

The S^4^-Dimethylthlaxol-2.yi)-24HUphenyttetra2olium Bromide Aa- 
*ay for CeB Growth. Cell proliferation was examined by colori metric 
l3^4^^iirjetlrylmia7ii-2-yl>2^iphenylte bromide; MTTJ assay. 

MTT is a pale yellow substrate that is cleaved by living cells to yield a dark 
blue formajtan product This process requires active mitochondria, and even 
freshly dead cells do not cleave substantial amounts of MTT. Briefly^cella 
(MX) cells per well) were seeded in 96-weJl plates and cultured overnight. Then 
cells were incubated with 1 Mmol/L retinoids, and the media were changed 
every second day. After 7-day treatment, 0.01 mL of MTT solution (5 mg/rnL) 
was added to each wdl. mixed gently, and incubated with the ceils at 37°C for 
2 to 3 hours. The media were carefully removed, 0.1 ml. of DMSO was added 
ro each well, and plates were assayed for cell proliferation as described 

previously (18), t 

RAR/55 Promoter-Urafernse Reporter Pfasmida, The I4cb 5 flanking 
region (P-KXXV+33 relative to the transcription start site) of RAR05 was first 



isolated by PCR from genomic DNA extracted from MDA-MB435 brea*t 
cancer cells with Advantage 2 PCR kit (Ctonteeh). Primer pain were designed 
to contain a X/wi rcHtriclton site ar the 5' end of the forward primer imd aXho 
restriction site at the 5' end of the reverse primer. The PCR product was first 
cloned to the pCR4~TOPO vector, and then subcloned to the KpnIXkn sites of 
the promoierless PGL3 basic vector (Promega). Orientation and sequence of all 
of the constructs were verified by direct sequencing. All of the other promoter 
deletion rnutauVmcaistrncts (P-428/+33, -323Z+33, -302/H 33, -177/+33, 
-99/+ 33) were cloned in the same way with K3U-P-10OCV+33 as a template. 

Cell Transections and Lodfemst Assay. MCF-7 and T47D cells were 
plated at I to 1.2 x 10 5 cells per weD in 12- well plates. After overnight 
incubation, the media were replaced by MEM containing 2% fetal bovine 
serum. Transient transection was done In the same media with Urjofccraraine 
2000 (lnvitrogen). Cells were trartsfected with 0J5 jua/wcll promoter oonstmcts 
{OJS ug/well for PGL3-P-1 OOTV+33, the amount of other deletion mutants was 
correspondingly adjusted to make each well contain the same amount of the 
plasmids) with or without 0.5 MgA*cll pcDNA3.1 empty vector or 0.64 Mfi^l 
pcDNA3. 1(RAR^2) expression vector. A 20 ng/weJl pCMV^gal vector (Clon- 
tech) was cotransfected as an tntemal comrol for transfectlon efficiency. After 
3 hours Tncubaiion, the medium was replaced with a fresh one containing I 
ujuoI/L atRA or other retinoids or DMSO (solvent control, I §»U\C ml. 
media), and ceils were iocubatcd fur an additional 24 hours. Luciferasc and 
^-galactosidase activities were assayed with Luctferttse Reporter Assay Kk 
and Lujwnescent 0-gal detection kit 0 (Clontech). 

RESULTS 

A Novel RAR0 Transcript in MCF10A Breast KphhcHaJ CelbL 
During characterization of RAR02 exprexskm in MCF10A acriesi of 
cell lines (14). with primers recognizing RAR02 cooing region, we 
detected RAR02 transcript by reverse transcription (RT)-PCR; but we 
failed to detect RAR02 transcript with primers recognizing RAR02 
5'-untranslaxcd region (UTR). Hence, to examine the 5' region of 
RAR02 in MCF10A cells, we did a 5' -RACE analysis of the 
MCF10A total RNA with two RAR02 specific primers (Fig. U). 
Using these two primers, we failed to detect the expected RAR02 
fragments with aire -1.8 kb and -1.3 kb. respectively, bnt did 
consistently detect a band with a smaller size (~0.6 kb shorter). The 
5'-RACB product was cloned and sequenced. The Blast search of 
GenBank suggests it to be a novel KARfi tsoform that has not yet been 
reported, henceforth il is referred to as RAR05. The 3' end sequence 
of RAR/35 cDNA is presented in Fig. LB. Only the first exon (Fig. I; 
cxan 6 t 58 bp, hold) is RAR05 specific. We also used another rxirner 
set designed in terms of the RARpS-specific sequence (RAR^3- 
14FP) and the 3'-UTR sequence of RAR02 (RARp2-1827RP) to 
amplify a fragment spanning the whole coding region of RAR05. 
Cloning and sequencing analysis of this PCR product showed a 
unique first exon of the RAR05, whereas all of the downstream exons 
are common to all of the isoforms of RAR0. Artgnment of the 
frrst exon of RAR05 to bacterial artificial chromosome (BAQ 
clones RP1M21F9 and RP1I-733HI1 (jOenBank accession nos. 
AC133141.2 and AC098477.2) shows* that the first exon of RAR03 is 
located -294 kb downstream of the first exon (exon 5) of RAR02 
and —2.8 kb upstream of the second exon of RAR02. Therefore, this 
novel exon is numbered exon 6, and numeration for the drrwnstrcam 
exons is updated from what was reported previously (ref. J 1 ; Fig. I O- 
The 5'-UTR of RAR/35 mRNA is 237 nucleotides long and contains 
2 upstream ORFs (uORFs; Fig. IB). We note in this respect that both 
RAR02 and RAR04 contain multiple uORFs (8, 1 1) that could pla> 
a role in tightly controlling translation efficiency. Differing from 
RAR02, the first exon of KARpS does not contain any translation 
start codon, the 5'-most AUG of the RAR05 transcript with the 
RAR05 coding sequence is located at nucleotide 238 of RAR05 
mRNA (within the third exon of RAR/35) and corresponds to an 
internal methionine codon at amino acid 1 13 of the RAR02 protein 
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<*-KABP (Um*AR|32-1754K) 



B JficmS KmTfrl 

1 \tACAAAJUUVTCTGOJJCG?CC^^ 

61 ATTGAAACACAGAGCACCAGCTC^^ 



121 CCrCXTC^GTCTM^AACCt'Tt^TIlGTCTGCCTGCACWTO 

161 ggggtc aqc gcc tgtga&g g£ tgtaagggc tttttc cgc aga actattc>g ft ag aatjtbs 

Raf 

241 ,ajTT7lCACTTCTCaCCGaGATAAGAACTCTO 



I A I M I C I T> i E I ? 1 RAWprotdaC-WkDa) 



JL^R*? mRNA 0.1**) 



StopCodoa 



TSc?«M& 



Stop Coded 
4 



KARp5 m 1CA £.5 kb) . 



E MIYTCSRDlDICVIiaCVTimKCQTCJUiQKCT 30 
XVCM5K1S VWJDRNKKKKTTS KQ.ECTISTT 60 
MTAXLDDLTBCI RKAHQETTPSLCQLdKTT 90 
TNSSADHRVKLD LGLWCOTSILATKCIIXI 120 
VKFAXRLPGTTG IrTXADQITLLXAAClDIL 150 
ILRICTRYTPEQDTMlTSDGLTLNRTQMHIf 180 
AGrGPLTDLVTTTOtQLLPLIMDOnTGliL 210 
SAICLICXSDRQDUKPTKVDXLQXPLLBOi 240 
KlYJRKRRPSKPHMFPKIIJfKITOLRSISA 270 
XGAKRVITLKME IPGSMP PLIQHi LENSES 300 
USPLTPSSS6NTASKSP5ISPSSVSNS6VS 330 
QSPLVQ 336 

As> 1. IdeuCiAcauoa of RAR05 in MCFI0A crib. A t agaru* gel araryxui of 5'-RACti product*. IV RACIi products with MCHIOA total RNA vera aralynd by agarose ael and 
a novel RAR£ boforn. RAR£5. was identified. M. lOG-hp ONA Udder, I. 5' -RACE PCR with primem UPM and KAR02 -17S4R; 2, 3'-KACK PCR With primers UPM and 
§i" RAR02-1216K. #, 5' end icqucnce of Ihc KAK/J3 cONA. Nucleotides are numbered relative lo the transcript km *tan site. The new esuri (norm 6) Netjucflce k bottled. IVansjaHon of 
W m RAR&5 begin* ai +238 of die RAR/35 transcript, ipJuaied shove right-angled arrow. The uORtfi in the 3' UTK are undcrtintd and fobrted above their coding aeqoenco. tucon 
" . junctions are indicated above straight arrow. C a schematic diagram showing conirarnon of R AR£3 and RAR/32 niRNA and pmlein structures. Prrtrin dDcruffm (A-P) of the JRAR0 
Worms are depicted to scale above <RAR02) or bdow (RAR03) diagrams rcprcacatiag their respective mRNA tcunaxxt. The molecular irmea are (he thcmetkvl value* prafictad 
un the bade of amino acid sequence. The positions of the iwudalwn »un «be and slop coJon are Indicated wilh short arrows along the mRNA diagram*. Note that the RAR05 trajuuttiOB 
begins wirtrfn the C region, resulting in ttti of the A, B domains a*td half of rue C domain. /X Western bmt analysis of products of in vitro inwslaiion with RARJ99 exprawioa vectab 
& predicted auiiao acid sequence of RAK03. 




|C I D I B III BAR/5 pretttftt~37kDa) 



(Pg. I, B and Q. This AUG i» wilhxn an apprvpriatc nucleotide 
context for translation imliation (19) and would result in a protein of 
336 amino adds with an estimated molecular mass of ~37 IcDa (Fig. 
:1<T). inyitro translation of RAR35 cxprcjislon vector confirroed that 
this AUG is a functional translation initiation codon (Fig. ID). It 
should be noted that in vitro translation generated multiple protein 
bands, which is consistent with a previous report and might not 
happen in the cells (1 IX probably because of lack of the whole UTR 
region in the expression vector and lack of natural chromatin envi- 
p»irncnL This RAR05 protein product is identical to a truncated 
RAR02 or RAR/34 protein reported previously (7, 11). The predicted 
amino acid sequence is given in Fig, 1£ 

AtRA Mediated Expression and Regulation of RAR/& Identi- 
fication of RAR/B3 raised a question as to whether its expression is 
mediated by atRA. To directly examine the presence of RAR05 in 
comparison with RAR02 in patients, we used the human breast cancer 
cells derived from the patients and being in early in vitro (<10) 
passages. In addition, we also examined the expression of both RAR0 
isofdrms in established breast cancer cell lines and normal human 
mammary epithelial cells. RT-FCR analysis showed that all of the 
examined breast cancer cells expressed detectable RAR05 mRNA. 
.but its expression was differentially regulated by atRA treatment 
(Fig. 2Ay It was up-rcpilatcd by atRA in BCA-I, 3, and 4 cells, 
whereas BGA-8 showed a slight down-regulation of RAR05 by atRA. 
Similarly, the level of R AR02 was up-regulated by atRA In some cells 
(BCA-1, 3, 4, 8, 9, and 10), whereas in others jt remained unaltered. 
However, no correlation between RAR02 and RAR/35 expression 
could be established. Since in breast cancer, estrogen receptor (BR) 
plays a critical role in its response to various chemotherapeutic agents 
and to quantitate the expression of RAR05 and KAR02 mRNA, we 
did real-time PCR with HMfcC, BR-positive breast cancer cell lines 
MCF-7 and T47D, and the ER-negalive breast cancer cell line MDA- 
'r MB435 that expresses a high level of RAR/J2 mRNA (II). Real-time 
PGR clearly showed that RAR/35 was preferentially up-icgululcd 
by atRA in MCF-7 cells, whereas RAR/32 was preferentially up- 
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regulated by atRA in T47D ceils (Fig. 20). ER-negative MDA- 
MB435 cells expressed a high level of RAR/32 box a low level of 
RAR/35 relative to HMEC cells. RAR05 and 02 were confiiuently 
expressed at a low level in ER-positive MCF-7 and T47D cells 
relative to normal HMEC cells (Fig. 2JB). Because other tetinoids 
function through a similar mechanism in the celts, it is reasonable to 
expect that other retinoids might also rncdiate RAR05 expression. By 
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Hp. 2. RT-PCR anatytts of RAR03 and RAR£2 mRNA oxprestioa and regidaiion by 
retinoids. A. Gens were treated with t ptnoUL atRA for two days, and total RNA was 
subjected to RT-PCR noniysU with specific primer*. Both RAR0S and RAR02 are 
dtfrerenUalrjr expressed Jo ail of these tumor cetti and regutaed by atRA & toMune^ 
PCR analysis showing relative levels of RAR0S and RAR^2 mRNA normalized to : 
/3-actin (the basal RAR^ mRNA level ia HMEC is see as 1) after atRA treatment ft 
luaoVL aiRA to 24 hours) in HMEC and breast cancer cell lines. Results are ea^xesied 
as the mean vaJoe of two independent experiments. C Real-tiine PCR aoarysis sbowina 
relative krveU of RAR0S mRNA normomxd to 0-eetm after treatment with RAR/RXR 
selective tigands ( 1 ftsaoVL for 24 hours) ia T47D eetta. The atRA served as a poaitfve 
control. Results arc expressed as the mean value cf duplicate analysea of the same cDNA 
samples. 
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Pig. 3. Detection of endogenous levels of R AR0S protein in mmnal ami bremJ cancer cells and cellular sensitivity to retinoid ticanncoc. A Mid A Wemero hid analysis of cell extratti 
of various cells wixh RAR0 specific antibodies. Twenty microgram* [A ) and 60 jig (8) of the loutl proteins from each cell Enc were loaded for Inunumihloi anary*is with two polyclonal 
iintibodiest raised against different CXWH-tcrminal RAK0 epitope* (A, antibody recognizing amino acid* 430-447 of RAR/92 (sc-552, Santa Cruz Biotechnology); and b\ onrfcooy 
recognizing amino acids 407-423 of RAR02 ( 1602 1 MX Oenefca)], tcxpsctively. RAKJ33 protein was detected as • -37 fcD» protein band C 0-actio was used a» on ioienul central. 
(X Western blot analysis of pnxhicU of bi vitro translation with different vectors. The positions of molecular mass markers arc indicated to too right. RAR02 (-55 It Da) woa msj 
detectable b any of the cell lines except positive control (£>). £. MTf is«y of cell proliferation in response to retinoids. Data arc expressed as the percentage of DM SO control ± SD 
of 8 wetls. All of the data shows are rcrjrctcruauvc of three indepcmleni experiments. P < 0.01 compared with control; P < flOOl coinpared with control. 



using conventional RT-PCR, wc confirmed that 4-HPR and 
also differentially up-iegulated RAR0S expression in MCR7, T47D, 
and BCA-3 ceils. Fig. 2C presents data showing reaWime RT-PCR 
analysis of RAR05 expression mediated by different RAR/RXR se- 
lective ligands in T47D cells. In addition to atRA, Am80 (RARjS/a 
selective ligand) and LGD1069 (RXR selective ligand) also signifi- 
cantly up-regulated RAR05 expression, whereas 4-HPR (a weak 
RAR-y ligand) and 9-cfrRA (RAR/RXR ligand) had relatively low 
efficacy in mediating RAR05 expression- (Fig. 2Q. 

RAR05 Protein Expression In Correlation to Cellular Resist- 
ance to atRA. Wc did Western bloc to analyze RAR0 protein ex- 
pression in a panel of breast epithelial cells (normal HMEC; ER- 
positJve MCP7 and T47D; ER- negative MDA-MB231, MDA- 
MB435, human breast uircirwrma BCA-2 and BCA-8, MCF10A 
benign, and MCF10AT prcmalignant breast epithelial cells) with two 
RAR/J polyclonal antibodies that were raised against amino acids at 
I he COOH-terrninal (a region common among human RAR0 iso- 
forms). A protein band with the expected molecular mass Or 37 kDa) 
was detected in HMEC. MDA-MB231, BCA-2, MCF10A» and 
MCF10AT cells by both antibodies (Fig. 3, A and fl), suggesting this 
protein could be RAR/35. An RAR02 protein band (—55 kDa) was 
only detected from die in vUm translation product (Fig. 3D), None of 
the cell lines expressed detectable RAR/32. In another experiment, we 
were not able to detect RAR02 protein expression in all of the BCA 
(-1. -2. ... -1 1) cell Knes, but we detected KAR02 protein in HMEC 
cells that were from a different source (Cambrex Bio Science Inc., 
Walkersville, MA) 4 with the same antibody (C-19. Santa Cruz Bio- 
technoJogy)» indicating that RAR^2 protein expression could be cell- 
type specific. The ~*37 VrDa protein detected in this study could be 
identical to the RAR0 prutein isoform (termed RAR04^ -40 kDa) 
identified previously in breast cancer cells (II), because same anti- 
body was used for the detection, and the molecular size is also very 
close considering the 10% margin of error for our molecular mass 
standards. This RAR/J protein isoform seemed to be preferentially 
expressed in ER-negative normal and cancerous breast epithelial cells, 
but it was not detectable in ER-positive breast cancer line MCF-7 and 
T47D. 

To evaluate whether RAR05 expression is associated with cellular 
resistance to retinoids, cell lines expressing different level of RAK05 
protein (Fig. 3, A and B) were selected to assess their sensitivity to 
retinoids with MTT assay. Immortalized benign MCFI0A cells, 
which express high level of RAR#S, were resistant to both atRA and 
4-HPR; ER-negau've MDA-MB231 cells, which also express RAR/J5 
protein, showed resistance to atRA, but 4-HPR effectively inhibited 



the proliferation of MDA-MB231 cells. ER-positive MCF-7 and 
T47D cells in which RAR05 protein expression was not detectable 
were sensitive to both atRA and 4-HPR (Fig. 3£). Tn addition, MDA- 
MB435, MCF10AT, and BCA-2 cells, which express detectable 
RAR/35 protein, were relatively resistant to atRA (data now shown), 
ihese results suggest that RAR03 might contribute to cellular resist- 
ance to atRA, which functions through receptor-dependent pathway. 
RAR/35 did not have much influence on the effect of 4-HPR, which 
functions through both rcccptor*dependent and independent pathway 
(20). 

Sequence Analysis of 5 r Flanking Region of KAR05. Although 
the RAR05 regulation pattern by atR A was more or less similar to that 
of RAR02, in some cells such as BCA-2 and BCA-8, MCF-7 and 
T47D, the expression pattern was significantly different Sequence 
alignment showed that the first exon of RAR/35 is far away (—30 kb) 
from the P2 promoter, suggesting that KAR05 and RAR02 use 
different promoters. Trterefore, we cloned and seq uen ced the l-kb 5' 
flanking region of RAR05. Fig. 4 shows the 480-bp 5' flanking 
sequence of RAR05. The 5' flanking sequence (-HXXY— 59) of 
RAR05 was analyzed with several promoter identification programs 
including Proacan (21), Promoter 2.0 (22), BDGP: Neural Network 
Promoter Prediction (httrWywww.rrujtfly^rg/sci^ 
McPromoter (htrp^/genes jiutxdu/McPnmK>terJnmlX Proox)teiIri^cctor 
(htoi^Avww. gen omatix.de/). None of these programs was able to 
predict this promoter. The region close to the putative transcription 
start site lacks the canonical TATA and CCAAT boxes, but a TATA- 
like box (TATA ATT) is present 42 bp upstream of the transcription 
start site. Additional analysis wilh Matlnspector (23) and TFSEARCH 



'X. Peng. D. Yum K. Oiristov. iinpitbUshed data. 
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topi axaa 

O60 OCACACXAATCA^ACt^CTCCAAAAJOrTT^ 

GATA OAth 

sscx oata crap/8 

-240 X ftAAAtO AATATTriTAAAATIUtaatrX^ 

UTAT CCT1 

-iso nTTT nnnn f B afrs, >naft n : ti i v i r r r irfr TTftrffTTtrnnn i j i rnr r rj > i n \rnrkmnr\J^ 

CATA MEAT API . SRZ 
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-60 M ill I HU 1 1 1 1 1 1 1 - ^ATAATT>CCACCTrAqrAQaCAAAC^A AAACATITATUWC 
♦1 ATMAAAAAAATTCTUCaAGCT^ '* 

Hg. 4. Nucleotide sequeoce of the S* nanUag regioo of human RAR05 gene. 
Tranwription start site is tttutertuted in bold. Skodw$ denotes the cure arqucacc of 
potential tronscnpttoD binding sites with high identity to autftcntic core and niatrii 
scquenceft as idenTified by Mm Inspector \1X Nucleotides arc numbered acgativcry to the 
Itfi of the sequence with nucleotide +1 convspnodtug to the transcription start aha. A 
TATA -like consensus sequence is boxed. 
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Table 1 TnuMCrictiowd in i emtio* site mopping (5 '-RACE) of the hRAR05 gene in 
breait epithelial cells 







hRAR05 transcriptional sta 


irt site tcqucaoe 




Oote 


MCP10A 


MDA-MB435 




1 


ATAGAAAAAAT 


ATAGAAAAAAT 




2 


ATAUAAAAAAAT 


ATAGAAAAAAT 




• 3 


ATAOAAAAAAT 


ATAGAAAAAAT 




4 . . 


ATAGAAAAAAT 


ATAGAAAAAAT 




3 


ATAGAAAAAAT 






6 


ATACIAAAAAAT 





ftipjfwvwjc\xc.jpf^^ html) idenliGed ONA 

finding sites for AIM, GATA, SRY, CEBP/j8, NFAT, OCTl, and so 
Jjfonli at the region -5O0/+ 1 )fPig. 4). In addition. Promo (24) iden- 
' tihed a binding she for RXRa at 312 bp upstream of the transcription 
llftar t site . Howeywythcfie are the potential binding sites, and tests for 
^n^onalfty need to be onne^ to rrri their importance, 
y inscription Start Site oT RAK/I5. to determine the transcrip- 
tion start she of RAR05, we did 5'-RACE analysis un mRNA from 
3SjDA-MB435 and MCF10A cells. The 5' regions of both R AR/32 and 
^AR05 were cloned from MDA-MB435 cclla and sequenced. De- 
fesuse only a single transcription start site in the human RAR/32 gene 
>$as been defined (9), RAR02 cDNA cloning and sequencing served as 
a good control for mapping the transcription start site of RAR05 with 
5'4UCh a analysis. In total, six RARj35-positive clones from 
™ &CF10A 5 '-RACE products and four RAR05-poshive and two 
!^RAR£2-posiljve clones from MDA-MB435 were sequenced Se- 
quence analysis showed a single transcription start site of RAR05 in 
itjoth MCF10A and MDA-MB435 cells (Table I). Interestingly, a 
single nucleotide (A) deletion close to transcription start site was 
J found in Ave of the six clones from MCF10A cells and in all four 
-clones from MDA-MB435 cells. However, as no deletion was found 
"in the conesponding UNA region In MDA-MB435 cells, it seems that" 
the deletion happened during the transcription process. Two transcrip- 
tion start sites [+ 1 and - 1 1; + 1 identifies the first nucleotide of the 
putative transcription start site based on CenBank sequence data 
(NMJXXB65)] of RAR02 were identified m MDA-MB435 cells; it 
: seems that RAR02 has multiple transcription start sites in this cancer 
cell line. 



RAR/35 Promoter Activity. A series of RAR/35 proran^lucif- 
erase reporter vectors were constructed. When these constructs were 
transfected into MCF-7 and T47D cells (which are HR-poshive and 
responsive to atRA) and assayed for reporter gene activity, although 
differential promoter activity was observed in the two cell lines, (be 
region -99/+ 33 consistently showed negligible or very low promoter 
activity in the two cell lines, suggesting either the existence of a 
negative regulatory clement within this region or the presence of a 
strong activator in the region between' -177 and -99 (Fig. 5, A and 
B). In MCF-7 cells, PGLVI000/+33, 4287+33, -323Z+33, -302/ 
+33, and -177/+33 exhibited significant basal promoter activity 
(relative to the empty PGL3 Basic vector control). The atRA treatment 
additionally increased promoter activity by 2- to 5-fold in MCF-7 
cells but only 2- to 3-fold in T47D cells, which was in agreement with 
real-time PCR results (Fig. 25). Deletion of region -302/- 177 
significantly decreased promote* 'activity irniuc^ by atRA In MCF-7 
cells. On the basis of transfeetion assay data from both cell lines, it 
seems that the promoter region — 302/— 99 is the target region for 
atRA stimulation, whereas no RARE/RXRE was identified in this 
region. Therefore, either the target binding site has not been identified, 
or the stimulatory effect caused by atRA is an indirect effect. 

Because atRA functions through recepux-dependent pathway, we 
hypothesized that expression of RAR/32 could affect RAR/35 pro- 
moter activity mediated by atRA. To test mis hypothesis, RAR02 
expression vector (pcDNA3.1-RAR02) was cotransfected with 
RAR/35 promoter constructs into MCP-7 andT47D cells. After (ran* 
fection, the cells were incubated in the presence or absence of atRA 
for 24 hours, and hjciferase assay was done for promoter activity. 
Surprisingly, cotransfectioo of the empty vector pcDNA3.1 also 
greatly increased RAR/35 promoter activity, whereas cxXnmsfcction of 
RAR02 expression vector pcDN A3. 1 -RAR02 did not cause a signif- 
icant increase in promoter activity relative to the control (Pig. 5, C and 
D)r, however, in the presence of atRA, RAR/32 expression did signif- 
icantly increase promoter activity compared, with mat of empty vector- 
transfected cells in MCF-7 cells (Fig. 5Q, suggesting that activation 
of RAR/35 promoter activity by arRA is receptor dependent 

We also examined RAR/35 promoter activity in tbe presence of 
other RAR/RXR selective ligands in both MCP-7 and T47D cells. As 
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Fig. 5. RAR05 promoter activity is MCI«-7 and T47t> eelk. A and #, a rcpaa - 302/ -99 was found m he the ttrgpt region for tiRA-induced ptonoter activity. RASLpS | 
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fit. 6, RARB5 promoter activity if up-rcgulatcd by various RAR/RXR elective 
li£an£ in MCT-7 andT47D cells. CelU were temsfceted with WLVJl)OO/+33-RAR05 
promoter ccrutroct tnd treated with 1 >unoVL retinoids for 24 how*. Results ace from 
triplicate wells of one expferhnew; bars, mran * SEM. RLU. rettfi vc luciferase activity 
normalized to A-geL The atRA treaimeni served as a posith* control. 
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Fig. 7. RAftjftS » unique amcttj ftU of (he RARs and not a cleaved prorfna from 
RARS. Western Wol analysis or R AK0, RARa, and RARy In MCF-7 celU. cells 
were mated with DM SO (control) ur MC 132 (50 pwoUL) for 5 how* cell lywtes (50 #ag) 
were fubjected to Western bJut analysts wtib polyclonal »ttibudiet> nguijwi RARfi (Santa 
Cruz Biotechnology. sc-552). RARa (Snnta Cab Btoicchimlngy, sc-551X awl RARy 
(Santa Cruz BJotechnoloty. ac-350). 

shown in Fig. 6, A and £, ail of the tested RAR/RXR selective ISgands 
differentially increased RAR/35 promoter activity, whereas Am80 
showed the highest efficacy. These data also confirmed our RT-PCR 
analysis (Fig. 2Q. 

Is RAR05 a Unique Isoform among RA.Rs7.JThc identification of 
RAR/35 raises the question as to whether corresponding isoform exists 
in the olher two RARs. Sequence analysis revealed the possibility of 
existence of a corresponding isofonn in both RARa and RARy, 
because a similar internal start codori is present in both RARa and 
RARy at similar positions. In addition, we also observed a similar 
protein band at the position of -37 kDa in MCF-7 and MDA-MB231 
cells through Western bfot analysis with RARa- and RARy-specific 
antibodies that are not cross-reactive with each other and recognize 
the CbOH-terminus of the corresponding RAR isofonn. Therefore, 
we first 5'-RACE-analyzed the mRNA extracted from MCF-7 and 
MDA-MB231 cells with RARa-specific primers but failed to detect 
any expected cDNA band with a smaller size corresponding to the 
putative truncated RARa. Only a single RARa band of an expected 
size corresponding to full-length RARa was detected fdata not 
shown). Because these receptor isotbrms degrade quickly, we there- 
fore hypothesized that the observed protein band of low molecular 
size for RARa and RARy might be a fragment generated from 
protease cleavage. We used cell permeable proteasome inhibitor 
MG132, which can inhibit the degradation of RARa and RARy (25). 
MO 132 treatment completely blocked the generation of this fragment 
(Fig. 7), showing that these bands are products of protease cleavage of 
RARa and RARy. The biochemical and biological properties of these 
RARa and RARy fragments are not clear at present When cells were 
treated with MG 1 32, a corresponding equivalent band of RAR/35 was 
observed. Whether (he expression of RAR/35 protein is because of 
inhibition of protein degradation or induction by MG132 treatment 
still needs additional in-depth studies. 

Because a major level of regulatory control by retinoids is post- 
IransJational. we examined effect of atRA treatment on RAR0S pro- 
tein expression with and without proteasomaJ inhibition. Because 
proteasomal inhibitors are generally cytotoxic, a period of 8.5 hours 

8916 



was found not to trigger significant cell death in the cells examined. 
Cells were treated for 8 and 24 hours with 1 junol/L atRA, and then 
treated with or without 40 /ixnol/L MG 1 32 for the final &5 hours. We 
did not observe significant alteration of RAR05 protein level in 
MDA-MB435 cells; however, no RAR/3 proarin was detectable in 
T47D cells in either condition (data not shown). Nevertheless, MG132 
effectively blocked atRA-induced RARa degradation in both cell 
lines as observed in MCF-7 cells (25). 

Genomic Structure of RAR/35 in Comparison to RARJ92. On 
the basis of the identified RAR/35 cDNA sequence, (he known 
RAR02 sequence, and the published Human Genome Project Data 
(http^/www.ncbi.n1m.nm.gov/geiK)me/guioWhuman/) t we were able 
to elucidate the complex organization of the RAR/35 and RAR/52 
genes. BLAST search permitted us to align the first three exons of 
RAR05 to a 70560-bp BAC clone RP1M21F0 (GcnBank accession 
no. AO 33 141) mapped to chromosome 3p24. Similarly, the remain- 
ing exons were precisely aligned within another 189308-bp BAC 
clone RP11-659P16 (Gen Bank accession no. AC09341& Then the 
first exon of RAR02 was aligned within the 198468-hp BAC clone 
RF11-733HI 1 (GenBank accession no. AC098477). Additional align- 
ment of these three clones showed a 4145-bp overlap between clones 
RP1 1-733H 1 1 and RP1 1-421 W and al862-bp overlap between clones 
RPK 1-421F9 and RP-659P16 (Fig &0), snowing the continuity of the 
gene sequence in these BAC clones. An analysis of ihese three BAC 
clones revealed mat the RAR/35 gene spans over T30-kfo of DMA, 
whereas the RAR02 gene spans over 160 kb of DNA. All of the splice 
junctions conform to the GT/AG rule for splice donor and acceptor 
sites (ref. 26; Fig. 8A). fig. 8B summarizes our analysis on the 
genomic structures of hRAR£5 and hRAR/82 genes, and a new 
numeration for their exons is proposed herewith. 

DISCUSSION 

The major biological effect and gene expression induced by reti- 
noids are believed to be mediated by nuclear receptors RARs/RXRs. 
Because RARs and RXRs are primary effectors of retinoid signaling, 
they themselves seem to be targets for disruption In tumorigenesis. 
RAR/3 has been extensively studied in human carcinomas, and several 
studies nave suggested that it might play a role in tumor suppression 
(27-29). Therefore, RAR0 has been considered a target molecule tor 
retinoids in chemoprevention and therapeutic studies. 

In this study, we identified RAR/35, a novel RARfi isofbrro directed 
by a distinct promoter P3. We also provided the first evidence show- 
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I SLuficd truncated RAR0 protein (7. H). in 1999, Sommer el a/. 
m ^identified a 40 kDa RAR0 protein isoform. which was interpreted 
RAR64 This RAR0 protein isoform was found to be elevated in 
Wfimah breast tumor cell*, especially in cytoplasm relative to RAR02 
- rOtein, In 2002, Chen et «/. (7) showed an antagonistic role for this 
Ja»6 protein isoform in signaling by retinoic acid and termed it 
%ARB' and its expression was interpreted as leaky scanning." In the 
year, another group (15) showed that die expression of this 
irolcin isoform is associated with cdlular resistance in response tn 
•retinoids. They also interpreted it as RAR04 and tried to link the 
Stein expression to RAR04 mRNA expression. These data see^ned 
^rrect, but the interpretation) 1 on the generation of that RARp 
$flfcARfl4 or RAR0') protein isoform seems questionable. There has 
Lpo evidence showing that the protein isororm is translated from 
ISbgcrunis RARP2 or RAR/54 transcripts. The interpretation i«i the 
generation of that RARp protein isoform was based on transfecticn 
^tnd in vino translation experiments, in which the expression vectors 
sjeherally do not contain full-length 5' and 3'UTR, and the reporter 
£feenes are not in a natural chromatin environment In addition, the 
^ex^nce of multiple uORFs in the long 5'UTR region of RAR02 and 
%RAR04 could also inhibit leaky scanning (30). Some cells such as 
i MCPI0A series of cell lines do not express detectable RAR04 
^nikNA. 5 the same protein isoform could only he from RAR05 tran- 
^.Script in these cells. The identification of RAR05 In breast epithelial 
/cells suggest* that RARp' is the primary translation product from 
fcoRF of RAR05 transcript. Should leaky scanning occur with RAR02 
lor RARB4, it might result in RAR0' at a very low level (30, 31). 

•Moreover, the existence of multiple uORFs and the long leader 
^sequence In RARp isoform mRNAs could be a signal that their 
♦ translations are under fight control. - ^ * . 

i In most or the previous studies, measurements of RARp expression 
£ are preferentially made at the mRNA levels (25), leading to certain 
I level of complexity in undemanding Hs function. All the more, in 
< < most of breast cancer cell lines, RAR02 protein was not detected by 
% Western Wotting, although its mRNA was detectable (Fig. 2). We 
have carried out experiments to address these concerns to a certain 
^ extent and to study the expression of RAR05 both at the RN A and 
protein levels in various patient-derived primary breast cancer cells, 
' established breast cancer cell lines, and immortalized benign 
MCF10A, piemalignant MCF10AT cell lines, and normal human 
' breast epithelial cells. At the mRNA level, RAR05 is expressed in 
normal human breast epithelial cells as well as in benign, premalig- 
nairt, and tumor, cell lines. In the presence of atRA, the level of 
RAR62 mRNA is prefercntiaUy elevated in contrast to RAR05 in 
. T47D cells. At the protein level, we failed to detect endogenous 
RARJ32 protein by Western blotting but did detect a corresponding 
RARB5 band in MDA-MB231 and HMBC cells. In agreement with 
ourTudies, Tanaka tx ai. (25) also failed to detect RAR02 protein 
under their experimental conditions. Hence, either RAR02 protein is 
not stable, or its expression is too low to be detected in these cells. 

RAR05 identification also defines a new type of RAR/3 isoform, 
which is under the control of a distinct promoter P3, and the protein 
lacks the A, B, and part of the C domain (the first zinc finger) of other 
RARB isoforms. The loss of DNA binding ability while retaining'the 
capability to form heterodimers with RXR makes RAR0S act as a 
f/ ^omiiianl-4iegalive regulator of RAR/3 function (7). We note in 
this respect that similarly truncated RARa isoforms lacking all or the 
sequence located NH 2 -terminal to the second zinc finger have been 
identified previously (32). Musi analogous to RAR05 is the proges- 
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lerone C mRNA that encodes a NH 2 -terrninally truncated progester- 
one receptor (33, 34). RAR05 protein expression, localiiatlon, and 
function were characterized previously as a truncated RAR0 protein 
isoform (7, Hi 15). Un,ike sanm other te P onea dominant-negative 
nuclear receptors (35. 36\ RAR05 does not bind ck-acting DNA 
elements and therefore cannot directly inactivate gene transcription. 
RAR05 likely represses by stoichiometric competition, away from 
the RARE, against other transcription factors within the cell (e.*, 
RARa, RAR0, and-RARy) for transcription cofactors (7). Although 
RAR04 protein (identical to RAR£S protein) was repotted to be 
elevated in breast cancer cells (11, 15). our data show that both 
RAR05 mRNA and protein ate expressed in normal HMfiC cells, 
indicating that RAR05 is not a tumof-speciflc Isoform; it coold be a 
regulatory factor for RAR0 target genes in both normal and tumor 
breast epithelial cells. We could not detect RAR05 protein in KR- 
positivc breast cell lines that are sensitive to retinoids, whereas it can 
be detected in ER- negative breast cancer cells and normal breast 
epithelial cells that arc relatively resistant to retinoids, indicating that 
this isoform may contribute to cellular resistance to. retinoids. In the 
metastatic atRA-resistant M-4A4ccll tine derived from MDA-MB435 
cells, RAR04 protein (identical to RAR05) was elevated in compar- 
ison ^ih the isogenic mmmetasUtic NM-2C5 cell line, and hs protein 
expression was also up-regulated by atRA alter 4- and 6-day treatment 
(15), which is consistent with our RT-PCR and MTT data and In 
agreement with the conclusion that it plays a negative role in RAR/52 
function (7). 

Analysis of the RAR05 5' flanking region by computer program 
failed to predict the P3 promoter, indicating that P3 is not a typical 
promoter. The functionality of the TATA-like box 42-bp upstream of 
transcription start site remains unclear. In this respect, another non- 
canonical TATA box (TATATTA) has been reported in the P2 pro- 
moter of RARJ3 (9). However, cloning and transfection studies of the 
KARB5 5' flanking region confirmed the presence of P3 promoter. 
The atRA target promoter region (-302/-99) lacks any canonical 
RX RE/RARE elements. The magnitude of activation of tiie RAR£S 
promoter by atRA seemed to be cell-type specific Cotnuisf cction of 
empty vector pcDNA3.l resulted in a significant increase in reporter 
gene activity, the reason is curreritiy unknown; trajiafection experi- 
mcnt can generate artifacts, and a control (empty) vector must be 
included for comparison to see the function of the transfected gene. 
The effect of atRA seems to be at least partially RAR02 dependent, 
whereas RAR02 overexpression itself might not have a significant 
effect on RAR05 promoter activity in the absence of Kgaod-atRA. 
Other RAR/RXR selective retinoids also differentially increased 
RAR05 promoter activity, indicating that both RARa and RXRs can 
be involved in the RAR05 transcriptional activation. Because only 
two promoters have been previously identified in all of the RAR geiies 
(1 1), the identification of this promoter has biological significance. 
Both RAR02 and RAR^5 can be transactivated by atRA in the same 
cells, whereas their functions seem to be different, revealing a mech- 
anism of fine-tuning aiRA-mduced transcription. 

The corresponding isoform or RAR05 in RARa and RAR7 gene 
seems not to be present, although fragments cleaved from RARa and 
RAR 7 protein were detected. Whether the fragments are functional or 
not and their biochemical properties still need to be detexmined. It 
seems RAR05 isoform is unique among all of die RARs and not a , 
cleaved product from other RARp isoforms, which suggest that/ 
RARp gene might have different function from other RARs. The 
expression and regulation of RARp protein is an important issue for 
functional research of this receptor. We did not observe si^icanl 
regulation ofRAR/35 protein by atRA after 24 hours treatment with or 
without proteasomal inhibition, suggesting thai the translational reg- 
ulation is independenl of transcriptional regulation under the experi- 
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mental condition. The R AR/3 posMranslarionaJ regulation by retinoids 
will be addressed in depth in Che future studies. 

In summary, we have identified a novel, unique RAR/3 isofurm 
(RAR05) and mapped its promoter region. We also initially charac- 
terized its expression and transcriptional regulation in normal and 
cancerous breast epithelial cells. RAR/S identification reveals an 
additional layer of complexity to retinoid signaling, and this isofonn 
may serve as a potential target of retinoids in breast cancer prevention 
and therapy studies. Future study on RAR/3 function should include 
the analysts of RAK05 isoform in both normal and tumor cells and its 
response to retinoids. Effective inhibition of RAR05 might be neces- 
sary for the prevenuon and treatment of breast and other cancers by 
retinoids. 
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Abstract Background: Ghtitdhbne^-tmnsferases (GSTs) ami 
N-acetyhmnsferosfis (NATs) are involved in the metabolism of a 
wide range of carcinogenic chemicals. Allelic polymorphism of 
these enzymes is associated with variations in enzyme activity, 
hence it may affect the concentration of activated carcinogenic 
chemicals in the body. Previous studies suggest a possible cancer 
risk-modifyinx effect of these allelic polymorphisms, but the results 
an stiU controversial We evaluated the effect of GSTMl, GSTT1, 
GSTP1, NAT1 and NA 72 enzymes on individual susceptibility to 
colorectal cancer, with particular attention to possible interactions 
between the studied genotypes. Materials and Methods: Five 
liundred colorectal cancer patients and 500 matched cancer-free 
controls were included in the study. The allelic potytnorphisms of 
CSTMl, GS1TI andGSTPl, NAT I and NAT2 enzymes were 
determined by PCR-based methods, from perqiheral blood 
leukocytes, and allelic distributions were compared between 
colorectal cancer patients and controls. Results: The GSTMJ 0 
allele (OR: 148, 95% CI: 115*1.92) and rapid acetylam 
genotypes of NAT2 (OR 151 95% CI: 1.17-1.98) were associated 

an elevated risk No statist 
NATl GOTTA GS7T1 genotypes and colorectal cancer was 
found Remarkably increased risk was associated with the GSTM I 
0 alkie - NAT2 rapid acetylator genotype combination (OR' 239, 
95% CI: 175-3.26) and with tiie GSTMl 0 allele -NAT2 and 



Correspondence UK htvan Kiss, Department of Public Health, 
Faculty of Medicine, Pcc» University of Sciences, Szigcti Sir. 12. 
H-7643 Pecs, Hungary. Tel: (36) 72 536 395, Far (36) 72 536 395, 
e-mail: wtvan.kiw@aok.ptc.hu 

Key Words: Colorectal cancer, metabolizing enzymes, cancer 
susceptibility, N-accrylrransferase, glututhionc-S-transferase. 



NAT! rapid acetylaUx triple combination (OR- 3.28, 95% CI; 
ZOthS.23). Carrying 4 or 5 putative 'iiigh-rislf alleles su 
btcreased die risk of anorectal ameff 95% (3:233- 

5.86). Occlusion: The genotype of certain metabolizing enzymes 
affects the risk for colorectal cancer, Tliis effect Is particularly 
important when certain allelic combinutirms are studied In the 
near future, individual level risk assessment may be reached by 
fitrtlier increasing the number of studied potymorphistnx 
combining them with traditional epidem iol ogical risk factors. 

It is generally accepted that cancer risk is determined by the 
interaction or environmental and genetic factors. Except for 
hereditary tumors, external carcinogenic exposure is involved 
in human tumorigencsis. Carcinogenic chemicals, howeveT, 
undergo a complicated process of metabolism in the human 
body, typically, these chemicals are activated by the so-called 
phase I metabolizing enzymes, which results in the formation 
of elcctrophilic, reactive compounds (1). The amount of active 
carcinogens is in good correlation with the risk of UNA 
damage and cancer formation. Detoxifying enzymes - phase 
II enzymes - heJp in the removal of carcinogens from the body 
(2). Most of these enzymes conjugate the carcinogenic 
chemical with a small molecule, making it less toxic and more 
water soluble. Therefore, it seems to he a logical assumption 
that the detoxifying capacity to a certain extent determines the 
individual susceptibility to cancer. 

The activity of detoxifying enzymes in humans is basically 
determined by the genotype of the enzyme (2). Most of oar 
metabolizing enzymes are genetically polymorphic encoding 
proteins with different activities (2). Among the phase II 
enzymes, the glutathione-S-transferase (GST) superfamiry 
and the N-acetyltransferases (NATs) have long been 
suspected to have an influence on cancer susceptibility (3-9). 
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Tabk I. Allelic distributions of the studied GST and NAT enzymes in the Tabic II. Allelic distributions of the studied COT and NAT enzymes 
control group. amonx colorectal cancer patients. 
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the GST enzymes have a relatively wide range of 
substrates, e.g. pulycyclic aromatic hydrocarbons, 
monohalumethanes, ethylene oxide, different solvents, 
pesticides (10). The supcrfamily consists of 6 families: a, u, n, 
o, 0 and tj. Probubry, from a card oogenetic point of view, 
GSTM1, GSTTI and GSIY1 arc the most important enzymes, 
from the m 0 and n families, respectively. In Caucasian 
populations, almost half of the individuals have no functional 
GSTM1 enzyme, due to a homozygous deletion in the gene (0 
genotype) (11, 12). The situation is similar in the case of the 
T1 enzyme, but the ratio of persons with 0 genotype is lower 
(13). The GSTP1 enayme possesses two single base 
polymorphisms, both resulting in an amino acid change in the 
protein {IleWSVai, AlaIF4Val) (14). in the case of the more 
frequently studied lk]Q5Vul polymorphism, the Val allele 
encoded enzyme exhibits lower activity and, in accordance 
with this finding, certain tumors fcg: lung, bladder) appeared 
to occur at higher rates among the carriers or the Val allele 
than among persons with the lie genotype (15, 16). 

The N-acclyllraasferases are able to catalyze N- and O- 
acetylation, the former considered to be a detoxifying and the 
latter an activating reaction (17). Among their substrates, 
known carcinogenic compounds - like aromatic and 
heterocyclic amines - can be found (17). In the NAT family, 
polymorphisms of NAT2 arc well characterized, but the 
NAT1 enzyme has only been recently studied from this point 
of view. Both NATs have several alleles: in the case of NAT2, 
the association of genotypes with enzyme activity is also well- 
established (usually people arc categorized as rapid or slow 
acetylators) (18). Tht relationship between NAT! alleles and 
acetylation speed is not so clear, but certain alleles also seem 
to be associated with the phenotypc (19, 20). 

Previous studies tried to find an association between the 
risk of different cancer types and the allelic polymorphism of 
GST and NAT enzymes. Regarding colorectal tumors, most 
of the studies suggested an elevated risk for individuals with 
the GSTM1 0 genotype (21*24). Based on theoretical 
considerations (because of O-acelylation of heterocyclic 
amines present in the Gl system), rapid acetylators should 
also be ai higher risk, but the results are controversial (25-28). 

In the present case-control study, we tried to characterize 
the role of GSTM1, GSTTI, GSTP1, NAT1 and NAT2 



polymorphisms in determining susceptibility to colorectal 
cancer. Since carriers of 0 alleles for the GST enzymes have 
a decreased detoxifying capacity, if this is combined with the 
rapid formation of metabolites of heterocyclic amines 
ensured by being a rapid acetylator, individuals with certain 
allelic combinations might be at a particularly high risk. 
Earlier, we demonstrated similar Interactions between 
cytochrome P450 1A1 (CYP 1AI), cytochrome P450 2E1 
(CYP 2E1) and GSTM1 alleles (29). The most important 
goal of the present study was to find audi allelic 
combinations, and quantitatively assess their effect on 
colorectal cancer risk. 

Materials and Methods • 

Five hundred colorectal cancer patients from the Central Hospital 
of Ihc Ministry of Internal Affairs and frum the area of Baranya 
and Vas County, Hungary, were included in the study. The 
diagnosis of tumors was always confirmed histologically. Patients 
with conditions affecting colorectal cancer risk (familial 
adenomatous polyposis, hereditary non-polyjxxas colorectal cancer, 
ulcerative colitis, etc.) were excluded from the study. Five hundred 
cancer-free controls from the same regions (mm cancer patients 
from in- or outpatient wards and volunteers for health status 
examination) were matched to the cases according to sge, sex, 
smoking habits, and red meat consumption. Ten ml peripheral 
blood was drawn from the participants, white blood cells were 
isolated by repeated centrifugation with 0M% ammonium chloride 
and DNA was isolated (30). 

CSTM1 and GSTTI genotyping (31) was performed by a 
simultaneous amplification m the presence of an internal control 
(a 268 base length fragment of p-globin gene), with the following 
primers: GSTM1-F: . GAACTCCCTGAAAAGCTAAAGC, 
GSTM1-R: GTTGGGCTCAAATATACGGTCG, GSTTl-F: 
TTCCTrACTGGTCCTCACATCTC, GSTTI -R: TCACCOGATC 
ATGGCCAGCA, p-globm-F: CAACTTCATCCAGGTTCACC, 
gJobin-R: GAAGAGCCAAGGACAGGTAC The reaction was 
performed in 20 ul volume: 1.5 mM MgClj, 10 mM Tris-HCJ 
(pi 1=8.3), 2 mg/mi bovine serum albumin, 4 x 0.25 mM dNTP, 2 
U Taq DNS-porymerase, 30-30 pmol GSTTl-F and GSTT1-R 
primers. 50-50 pmol GSTM1-F and GSTM1-R primers, 20-20 pmol 
p-globin-F and fV-globin-R primers, 13 ul DNS-tcmpiatC After a 7- 
min denaturation at *H*C t 35 PCR cycles were performed: 60 sec 
94*C; 60 sec 60'C, 60 sec 72'C followed by 5 min at 72*C. 

For GSTP1 the llclQSVal polymorphism was determined by a 
PCR-RFLP (32). A 176- bp fragment was amplified, with the 
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Table IlL M*k of col<x*ctal cancer by fettofypa of GST ami MAT 
trtzymes. 



Table 3V. Pumivt n h 



Wis** olfcte* per pentm in the canirvl and row 





Odds ratio 


95% confidence Interval 
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-•V OSTM1 

earn- 
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••: > « NATS 


1.48 
1.29 
1.11 
1.14 


I.I5-I.92 
0.95-1.74 
0-85-1,43 
0.88-1.48 
1.17-1.98 


0 ^hi^h-rtok" allele per person 

1 "high-risk" allele per person . 

2 'high-risk* alleles per person 

3 "high-risk" Hlielet per person 

4 "high-risk" alleles per person 


31 
120 
185 
134 

29 


24 

119 
131 
115 
75 








5 "high-rUk' allele* per person 


1 


16 
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following primers: S'-ACCCX^GGGCTCTATGGGAA-S' and 5- 
TCAGGOCACAAOAAGCCCCT-3\ l"he reaction was carried ont 
in 30 |U total volume containing 50 og DNA template, 4x200 |iM 
dNTP, 200 ng each primer, 10 mM Tris-HG (pH 8 J), 50 mM KCl, 
1.5 mM MgClj and I X) Taq DNA polymerase Parameters of the 
PCR reactions were as follows: 10 min at 9S*C, then 30 eyelet of 30 
sccat94*C30secat55 , C,aiKl30secal72 i C followed by a final 
extension step at 72 *C for 10 miu. 

The NAT2 allelic polymorphism was studied by restriction 
fragment length pdymorphisni (33). Mrst, n nested PCR was used 
to amplify a 547 bp fragment of the gene (outer primer set: 5'- 
AATTAGTCACACGAGGA-3* and 5'-GCAGAGTGATfCAT 
GCTAGA-3', inner set: 5XKHX3GGTX:rGGAAGCKXn"C>3\ 5'- 
TTGGGTG ATACATACA CAAGGG-3 25 cycles of 30 sec 94'C. 
30 sec 5y*C 45 sec 72 *C with the outer set was followed by 35 
cycles with the inner set with the same parameters). NAT 2*4 
(wfld-rypc), NAT2*5, NAT2*6 and NAT2*7 alleles were identified 
by restriction endonuclease digestion with Kpnl, Taql Odd and 
BamtV enzymes. Homozygous or any heterozygous carriers of the 
wild-type allele were characterized as slow acetyl* torn 

NAT1 genotyping wm also undcrtiikcn using a nested PCR- 
based RFLP (33), similarly to the NAT2 genolyping, with the 
following primers: outer. 5"*GATCAAGTTGTGAGAAG 
AAATCGG-3', 5*-CrAGCATAAA*lCACCAATTTCCAAG-3" f 
innen 5*-OACTCrGAGTGAGGTAGAAAT-3" f 5'-CCACAGO 
CCATCTfTACAA, at the underlined base constructing an 
additional Mboii restriction site at the amplification of NATP4 
allele. NAT1*4, NATP10 and NATV11 alleles were identified by 
this method, without studying certain rare alleles like NAT1*3 or 
NAT1U4. The presence of NAT* 10 or NAT* 11 alleles indicated 
the rapid acetylaror*. 

Statistical calculations were made by Epi Info 6 (CDC, Atlanta, 
USA) and SPSS PC+ software. Odds ratios and 95% confidence 
interval* were used to compare the occurrence of genotypes, in the 
case and control groups. In the case of the GSTM1 and GSTT1 + 
genotype hi GSTP1 homozygous lie genotype and, in the case of 
NAT enzymes, slow acctylator genotypes were considered us 
baseline risk category. 



Results 

The allelic distributions in the control and case groups are 
shown in Tables I and II, respectively. The found allelic 
frequencies in the control group were similar to those of 
other studies in Caucasian populations. As illustrated in 
Table III, GSTM1 and NAT2 allelic distributions showed 



statistically significant difference* between cases and 
controls. There were no statistically significant effect* of 
GSTT1, GSTP1 and NAT1 allelic rx>rymoiphisrns on 
colorectal cancer risk. Composing subgroups within the 
NATi and NAT2 slow or rapid acctylators by exact 
genotypes did not give any further statistically significant 
result (data not shown). 

Analyzing the joint effect of allelic combinaliims, GSTMl 
and NAT2 alleles seemed to substantially strengthen each 
others effect: in the control group there were only 83 people 
possessing both 'high-risk" alleles, while among the cases we 
found 161 such persons (OR: 2.39, 95% CI: 1.75-3.26). Tbe 
paired analysis of GSTT1-GSTP1, GSTT1- NATI and 
GSTP1-NA11 was also performed, but none of these 
combinations resulted in a statistically significant difference 
between cases and controls (data not shown). From triple 
combinations, GSTM1-NAT2-NAT1 caused the moat 
remarkable difference, with an OR of 3.28 (95% O: Z06- 
5.23) for the simultaneous presence of the three "high-risk 11 
alleles, suggesting a further risk-increasing effect by the 
third allele. 

Since the analysis of allelic combinations suggested a 
possible interaction between the studied polymorphisms, wc 
constructed a tabic based on tbe number of putative "high- 
risk" alleles per person among cases and controls (Tabic 
IV). The table clearly shows that persons with several "high- 
risk" alleles are relatively frequent among cases, while the 
control group mainly contains persons with fewer "high-risk" 
genotypes. When comparing the number of individuals with 
4 or 5 "high-risk" alleles between cases and controls, the 
result is significantly different (OR: 3.69, 95% CL 2.33- 
5.86). Interestingly, participants with less than 2 "high-risk" 
alleles were not significantly protected from developing 
colorectal cancer (OR: 0.93, 95% CI: 0.70-1.23). 



Discussion 

In our matched case control study, we found that carrying 
GSTMl 0 alleles or being a rapid acctylator were associated 
with an elevated risk of colorectal cancer in the studied 
Hungarian population- Unfortunately, several studies in the 
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field are not really comparable with each other, because 
some of them are not matched studies and, when matching 
is applied, the used variables may differ from each other 
Further discrepancies may be caused by the different study 
populations: the allelic distributions might substantially 
differ from each other, not only in the studied 
polymorphism, but also in other genes which may also 
modify die risk of colorectal tumors. 

The described problems can be seen when looking at the 
previous studies exploring the role of GSTM1 as a cancer 
risk modifier. The picture is confusing, since some studies 
suggested an association between 0 genotype and risk 
increase (34, 35), while others did not find any correlation 
(13). Our study, with relatively high case numbers, supports 
the hypothesis that the GSTM1 0 genotype is a risk factor 
of colorectal cancer susceptibility, litis is in accordance with 
the detoxifying role of GSTM1 in the metabolism of 
carcinogenic substances. 

The effect of GSTT1 polymorphism was not statistically 
significant^ although it was near to that level (OR: 1.29, 95% 
CI: 0.95-1.74). Such results always raise the question of 
whether an increased sample size would result in a 
statistically significant association. Unfortunately, the study 
of low penetrance genes in human populations is fairly 
difficult, since existing associations might not be identified 
because of the presence of several confounding factors and 
Ibe heterogeneity of the study population. This emphasizes 
the role of comparing and meta-analysis of different studies. 
Concerning the effect of GSTO, h is of interest that, in spite 
of being near to the level of statistical significance, no effect 
in double or triple combinations was found, while NA'i'l, 
with a weaker effect alone, was part of a triple combination 
(CSTM1 0 genotype -NAT2 rapid acetyialors - NAT1 rapid 
acetylators) which was associated with substantially elevated 
risk. Jn spite of the negative results of our study for the total 
sample, GSTT1 might be a risk modifier in certain 
subpopulations with heavy exposure to carcinogenic 
substances which are substrates of the GSTT1 enzyme. 

GSTP1 polymorphisms have not been considered to play 
an important role in human colorectal carcinogenesis, 
however, its allelic polymorphism is associated with 
differences in the activity of the encoded enzymes. Here, we 
must not forget about the recently explored role of GSTs in 
cell signaling pathways, independently of their glutathione- 
s-transferase activity (36). GSTP1 is involved in the 
regulation of the MAP kinase pathway, by forming a 
complex with the c-jun N-terminal kinase. In the process of 
human carcinogenesis,, GST enzymes as intracellular 
regulator proteins have been studied as possible factors with 
an influence on response to cytostatic treatment. From the 
cancer risk or cancer prevention point of view, this side of 
the GSTs has not been studied. Neither do wc know 
whether allelic polymorphisms of GSTs affect their function 



as intracellular regulators. Answering these questions might 
give further help in the explanation of the population level 
effects of GST alleles as cancer risk modifiers. 

While the GST enzymes are important detoxifiers of 
metabolites of potycyclic aromatic hydrocarbons, NA'ft are 
involved in the metabolism of aromatic and heterocyclic 
amines. Since these compounds are present in our diet or 
are formed during food preparation, and NATs are present 
in the colorectal mucosa, there is a possible mechanistic link 
to explain the role of NAT polymorphisms in human 
carcinogenesis. 

Allelic polymorphism of the NAT2 enzyme has been 
known for a long time, first detected phenotypically, based 
on enzyme activity distribution in healthy subjects, and later 
these activity differences were bound to an allelic 
polymorphism (37). Since NAT2 activates heterocyclic 
amines, rapid acetylators might be at higher, risk of 
colorectal cancer formation. In our study, NAT2 
polymorphism proved to be the strongest factor to affect the 
colorectal cancer risk. During recent years, the role of 
NAT2 seemed to be clarified by the previously mentioned 
model, which was also supported by epidemiological and 
molecular epidemiological facta. Particular importance was 
attributed to NAT2 in individuals with high red meat and/or 
well-done meat consumption (38), since these heterocyclic 
amine- containing dietaiy constituents served as sources of 
carcinogenic exposure. Some studies, however, seem to 
confuse the picture; a meta-analysis of D'Errico et a L found 
the NAT2 polymorphism not to be a significant risk factor 
(OR- 1.03, 95%CI: 0.93-1.14) (39), while a recent study of 
Sachse et al did not find an association between NAT2 
alleles and colorectal tumorigenesis (OR: 0.S2, 95% CI: 
0.69-1.12) (40), though still maintaining the connection 
between red meat consumption and colorectal cancer. 

NAT1 was originally believed to be monomorphic, 
because of the unimodal distribution of its activity in the 
studied populations. Recently, several alleles have been 
identified and enzyme activity variations were also 
demonstrated; however, the phenotypical variations (enzyme 
activity differences) were lower than those measured in the 
case of JNAT2 (19-20) alleles. Some recent studies also tried 
to demonstrate an association between NAT! alleles and 
cancer risk. The results are controversial Some studies 
identified NAT! variants as risk factors (mainly the 
NATI # 10 allele was studied) (41, 42), while others did not 
demonstrate any association at all (43, 44). Further 
confusion is caused by discrepancies in the genotype- 
phenotype relationships reported by different authors. The 
NAT 4 10 allele is generally considered to be associated with 
higher activity, but some results seem to contradict these 
findings (45). Similarly, the activity of the NATM1 allele is 
questionable. These contradictory results might be caused by 
tissue-specific differences in the expression of NAT enzymes, 
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as suggested by Bruhn et of, (45). Since we also performed 
an allele-specific analysis in the case of NAT1 and NAT2, 
resulting in the same associations as with large categories 
(rapid and slow acetylators), misclassiffcation error caused 
by erroneously putting a genotype into the "slow" or "rapid" 
acetylator groups can be ruled out in our study, 

Probably the most important part of studying the effects of 
low penetrance genes is the analysis of possible interactions 
between the investigated alleles. This might bring us to 
individual level risk assessment by giving a more precise 
estimation of the risk. From a practical point of view, the 
question is whether we are able to find such genetic conditions 
(allelic combinations) which considerably increase the cancer 
risk of a person. In our study such conditions included a triple 
combination with an OR of 3.28, A simple but very effective 
method for risk estimation is the calculation of simultaneously 
carried "high-risk" allele*. This method has the advantage of 
taking every existing interaction into consideration, studying 
actions as they happen, without including further possibilities 
of errors by introducing complicated mathematical modeling. 

Our results (Table IV) support the hypothesis that even 
those allelic polymorphisms which did not have a significant 
influence on the risk of colorectal rumors, in certain still 
unknown circumstances or in not yet determined 
interactions, also slightly contribute to the modulation of 
the final risk In our stu^ we demonstrated a substantially 
elevated risk in carriers of 4 or 5 "high-risk" alleles (OR: 
3.69), but this still did not reach the "level of intervention". 
The results, however, allow us to hope that genotyping 
several polymorphisms simultaneously, together with the 
analysis of known traditional epidemiological risk factors, 
will give us the oppurlunity, in the near future, to estimate 
the individual susceptibility to the most important cancer 
types, allowing application of individually-shaped preventive 
strategies, or working out screening programs for 
identification of "high-risk" individuals. 
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N- and C-Tbrminal Isoforms of Arg Quantified 
by Real-Time PCR Are Specifically Expressed 
in Human Normal and Neoplastic Cells, 
in Neoplastic Cell Lines, and in HL-60 
Cell Differentiation 
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The human ABL2 (or ARG) gene codes for a nonreceptor tyrosine kinase is involved in translocation with the ETV6 
gene in human leukemia and has an altered expression in several human carcinomas. Two isoforms of Arg with 
different N-termini (1A and 1B) have been described. The C-terminal domain of Arg contains two F-actin-binding 
sequences that perform a number of actions related to cell morphology and motility by interacting with actin 
filaments. We have identified different-sized specific cDNAs in hematopoietic, epithelial, nervous, and fibroblastic cells 
by means of the reverse transcription (RT>-poiymerase chain reaction (PCR) analysis of human Arg mRNA. Some of 
these cDNAs showed an adjunctive alternative splice event involving the 63 bp sequence of exon II, thus leading to 
four cDNA types with different N-termini: 1A long and short, and 1B long and short. Other cDNAs lacked a 309 bp 
sequence in the last exon involving one of the C-terminal F-actin binding domains, thus giving rise to two cDNA types: 
C-termini long and short. Quantified by real-time PCR — quantitative RT-PCR— these Arg transcript isoforms have 
specific expression patterns not only in different normal and tumor cell types, but also during cell differentiation 
and growth arrest These isoforms maintained the open reading frames, and eight putative proteins were predicted. 
The different C-termini isoforms seem to retain the same quantitative reciprocal ratio of their respective transcripts. 
The Arg protein isoforms with different C-terminai actin-binding domains and different N-termini might have specific 
cellular localizations/concentrations, and differently regulated catalytic activity with different implications in normal 
and neoplastic cells. © 2005 witey-uss. tnc 

Key words: Arg tyrosine kinase; mRNA splicing; transcript expression; protein expression; actin-binding sequence 



INTRODUCTION 

The Abelson family of nonreceptor tyrosine pro- 
tein kinases is denned by products of human and 
mouse ABL2 (also known as ARG, Abelson Related 
£ene) and ABL1 genes, and Drosophila and nema- 
tode ABL genes [1-3]. In human acute leukemia, 
ABL2(ARG) gene can be involved in translocations 
with the ETV6 gene and produce different chimeric 
proteins [4-6]. An altered expression of Arg tran- 
scripts has been described in different tumors [7-10]. 
The human Arg protein has a high degree of amino 
acid sequence identity (90%- 94%) with c-Abl in the 
tyrosine kinase SH2 and SH3 domains [3]. Two 
isoforms of both human Arg and c-Abl have been 
described as having different N-termini called 1 A and 
IB [3,11]. Four mouse c-Abl isoforms have been 
cloned with different 5'-ends arising as a result of 
the addition of alternative 5'-exons [12]. Although 
the long C-terminal domain of Arg is quite different 
from that of c-Abl, both contain three proline-rich 



sequences that bind to the SH3 domains of adaptor 
proteins [13,14]; c-Abl contains only one F-actin- 
binding sequence, while Arg contains two plus 
one microtubule-binding sequences [1,15]. The Arg 
product is located in the cytoplasm [16] whereas the 
c-Abl one is also nuclear. 
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protein. 
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The functional role of Arg is currently under 
investigation. Through its interactions with actin 
filaments, it performs redundant actions with c-Abl, 
playing a role in neurulation [17]. It has a function 
in adhesion-dependent neuritogenesis [18], and in 
synaptic structure and function [19,20]. Arg seems to 
be required for bacterial pathogenesis [21], and to be 
involved with c-abl in oxidative stress response [22] 
by regulating catalase activity [23]. The suppression 
of Arg kinase activity by STI571 induces cell cycle 
arrest [24], and it appears that Arg plays a role in 
homologous recombination DNA repair [25]. Lym- 
phopenia occurs during the development of mice 
harboring a homozygous disruption of c-Abl [26], 
thus indicating that Arg is unable to substitute c-Abl 
functions in lymphoid tissues. Arg is ubiquitous 
with greatest expression in nervous tissues [27]. Arg 
mRNA increases during granulocytic and macro- 
phage-iike differentiation of HL-60 cells [28], and its 
expression is higher in mature than in immature B 
lymphoid ceil lines [29] . 

During reverse transcriptase (RT)-polymerase 
chain reaction (PCR) analyses of human Arg mRNA, 
we had identified specific cDNAs of different sizes. 
These showed an adjunctive splice event immedi- 
ately downstream of both the alternatively spliced 
1A and IB exons, and the lack of a sequence in the 
last exon coding the C-termini. These events gave 
rise to four cDNA types diverging at the S'-end and 
two cDNA types diverging in the 3'-regibn. Their 
open reading frames were maintained, and the pos- 
sible combinations of the different splicing events 
made it possible to predict eight putative proteins. 
There was a differential expression of the Arg 
transcript isoforms quantified by real-time PCR— 
quantitative RT-PCR — under diverse physiological 
conditions and in normal and tumor cells. 

MATERIALS AND METHODS 

Cells, Tissues, and Human Cell Lines 

Unpooled samples of lymphocytes, monocytes 
and granulocytes were obtained from volunteer 
donors by means of density gradient separation in 
Ficoll-Hypaque and Percoll as described [30,31]. 
Purity (>90%) was determined microscopically after 
May Grunwald Giemsa staining. The leukemic blast 
cells were obtained at diagnosis from bone marrow of 
patients affected by acute myelogenous leukemia 
and separated by sedimentation on Ficoll-Paque 
gradient as mononuclear fraction. The leukemic 
blasts were >90% of total cells. These were char- 
acterized as myeloid ftftV), monocytic (M5) blasts 
according to FAB classification [32]. Tumor tissue 
specimens (renal clear cell carcinoma, Grade 2; colon 
carcinoma, Dukes histopatological stage A, Grade 1) 
and corresponding normal tissues (renal cortex; 
colon mucosa) were obtained from patients soon 
after surgical treatment; fresh tissue fragments were 



immediately put into RNAlater (Ambion, Austin, TX) 
and frozen down in liquid nitrogen. 

The human cell lines used (Table 1) were cultured 
with RPMI 1640 medium supplemented with 10% 
fetal calf serum, and tested during exponential 
growth. The growth characteristics, the differentia- 
tion of HL-60 cells to granulocytes by means of 4-d 
treatment with luM all-trans retinoic acid (ATRA) 
and to macrophage-like cells by means of 2-d treat- 
ment with 10 nM 12-0-tetradecanoyl-ph6rboi-13- 
acetate (TPA), as well as the immunofluorescence 
analysis of membrane GDI lb marker expression, 
were performed as previously described [28]. The 
GFD8 cell line was cultured in the presence or in the 
4-d absence of 5 ng/ml of granulocyte-macrophage 
colony stimulating factor (GM-CSF) for which it 
displays growth dependence. Removal of growth 
factor for 4 d led to reversible growth arrest in 
the absence of differentiation [34,35]. The growth 
characteristics of GFD8 cells were assessed by daily 
count and expression of the CD lib differentiation 
marker [28] . The ATRA, TPA, and GM-CSF came from 
Sigma-Aldrich (St. Louis, MO). 

RNA Extraction, cDNA Synthesis, 
and Qualitative RT-PCR Analysis 

Total RNA was obtained by cell and tissue ex- 
traction with TRIZOL (Invitrogen, Carlsbad, CA) 

Table 1 . Characteristics of Human Cell Lines Utilized 



B Lymphoid 

LP-1 (myeloma, mature plasma cell phenotype) 
Raji (lymphoma, mature B cell phenotype) 
AIIPO (acute leukemia, immature early pre B cell 
phenotype) 
T Lymphoid 

Jurkat (acute leukemia, mature post thymic phenotype) 
Molt-4 (acute leukemia, immature thymocyte 
phenotype) 
Myeloid 

K562 (chronic leukemia, erythroid lineage phenotype) 
HL-60 (acute leukemia, granulocytic lineage 

phenotype) 
GFD8 (acute leukemia, granulocytic lineage 

phenotype) 

U937 (histiocytic lymphoma, monocytic lineage 
phenotype) 
Neuronal : 

A-172 (glioblastoma) 

Lan-5 (neuroblastoma) 
Epithelial 

Caki-1 (renal cell carcinoma, clear cells) 
Hela (cervix carcinoma) 
Fibroblastic 
Hel 299 (lung embrionic fibroblast) 

The cell lines were from American Type Culture Collection, 
except LP-1 and Lan-5 that were from German Collection of 
Microorganism and Cell Cultures, and AIIPO {33] and GFD8 [34] 
that were a kind gift from A. Biondi (Milano-Bicocca University, 
Monza, Italy). 
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according to the manufacturer's instruction; it was 
spectrophotometrically quantified and its integrity 
was analyzed by electrophoresis in 1% agarose gel. 
The DNAse treatment of total RNA and the reverse 
transcription of an 8 jig aliquot of DNA-free RNA in a 
40-fil reaction.in the presence of 0.5 ug of random 
examers was performed as previously described [28]; 
2.5 jil cDNA was amplified in the presence of 0.4 pM 
primers, 2 mM MgCl 2 , 0.2 uM dNTP, 2.5 U Taq Gold 
polymerase and Ix manufacturer's buffer (Applied 
Biosystem, Foster City, CA). The primers used in the 
combinations described in Figures 1 and 2 had the 
following sequences [3] and localizations: 

41N 5' -ACAC AGGTCCATGGTACC-3' reverse (exon IV) 
42N 5M3CAG AGATCAGGACACIT-3' sense (exon 1 A) 
132N S'-AAGCTCCGfGGGCTCCAGOy sense (exon IB) 
112N 5'-CACCAGGGATAGGAAGGGG-3' sense (exon XII) 
113N 5' AAGGGTCATTOXATC-3' reverse (exon XII) 
1 14N S'-CTGCTCTGGAAGCCccgtg-S' reverse (exon XII) 
115N 5'-AOCAGATIXXOCrcnXXnX^3' reverse (exon XH) 
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4 IN, 42N, 132N primers have an additional 5' eight- 
nucleotide tail containing the £coRI restriction site. 
The capital and lower case letters of primer 114N 
show the fusion point of the sequences that 
juxtapose after the loss of a 309 bp fragment in the 
3 ; -end of Arg cDNA. The amplification program. 
was~95^C/10 min, (94°C/30 s, 60°C/30 s, 72°C/30 s) x 
40 cycles, and 72°C/10 min. All the amplified cDNA 
were sequenced with the ABI Prism Kit Big Dye 
Terminator v3.0 sequencing kit, and the ABI Prism 
3 100 Avant Genetic Analyzer. The intron-exon junc- 
tion of ABL2(ARG) (Figure 1) was determined with 
the NCB1 Genome Map Viewer Homo sapiens data- 
base, Build 34,Version 1 (http://www.ncbi.nlm.nih. 
gov/mapview/map_search.cgi?taxid=9606&query=arg). 

Quantitative Real-Time PCR Analysis 

Real-time PCR with TaqMan chemistry was used 
to quantify specific cell mRNA. The amplification 
was performed in an ABI PRISM 7900HT Sequence 
Detector. One microliter of the RT reaction (corre- 




ct SH3-*p SH2-*i I — - TPK * — | 
|| hi IV V VI VII VIII IX X XI XII 



y ... ,y 





SH3 


SH2 




TPK 







115N 113N 



41N 



112N 



-V v v 



11 4N 



figure 1. Schematic representation of the structure and cDNA 
coding sequences of the ABL2(ARG) gene. (A) The intron-exon 
structure of the ABL2(ARG) gene derived from the NCBI Genome 
Map View Homo sapiens database. Build 34, Version 1 and our own 
data. (B) Proposed isoforms of Arg proteins as predicted by the 
sequence of Arg cDIMA obtained by RT-PCR analysis. The first 
alternative exons 1 A and B are shaded, and the alternative spliced 



exon II 00 and C-terminal region (ACT) lacking in the CTS form are 
shown in blade SH3-SH2 and TPK indicate the SH3, SH2. and 
tyrosine protein kinase domains. The sense and reverse primers used 
in the RT-PCR are indicated with arrows. Primer 114N spans 
noncontiguous sequences. The position of the two F-actirvbinding 
regions in the C-terminal domain are also indicated (v...v). The 
diagrams are not to scale. 
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Figure 2. Detection of different ARG cDNA isoforms in various cell 
types by means of qualitative RT-PCR analysis. (A) 5'-end ARG 
isoforms. Left The 41N/42N primer pair (see Rgure 1) revealed two 
bands of 466 and 403 bp, corresponding to the 1AL and IAS 
transcript isoforms, and the 41N/132N primer pair amplified two 
bands of 523 and 460 bp, corresponding to the 1BL and IBS 
transcript isoforms. Right Amino acid sequences of the 1AL (top) 
and IBS isoform (bottom). The in-frame nucleotide sequence at the 
splicing sites (denoted by capital and lower case letters) are given. 
The 21 amino acids of exon II present in the 1 ALform are indicated in 
bold. (B) y-end ARG isoforms. Left The use of 1 12N/1 13N primer 
pairs (see Figure 1) led to obtaining two bands of 475 and 1 66 bp in 



688 790 

A172 cell lines (lane 1). In these cells and other cell types, mRNA 
amplification with the 112N/114N (even lanes) and 112N/115N 
primer pairs (odd lanes from lane 3), respectively, evidenced a 142 
and a 207 bp specific band, corresponding to the CTS and CTL 
isoforms. Right Amino acid sequence of the CTS tsoform and the in- 
frame site of the nucleotide sequences that juxtapose (lower case 
and capital letters) after the loss of the 309 bp fragment The 21 
amino acid sequences lacking in the 1 BS tsoform and the 103 amino 
acid sequences lacking in the CTS isoforms are underlined; the 
reported amino acid numbers are those of the entire coding 
sequence. 



sponding to 200 ng of cDNA) was amplified in a 50 ul 
PGR mixture, containing 1 x Universal PCR master 
mix (Applied Biosystems) and different concen- 
trations of primers and probes (Table 2) whose 
sequences were selected with Primer Express 2.0 
software (Applied Biosystems). The transcript of the 
glyceraldehyde-3-phosphate dehydrogenase (GAPDH) 
housekeeping gene [36] was amplified as an endo- 
genous control of RNA quality. Each cell line under- 
went at least two independent experiments, with 



each sample being analyzed in triplicate. The real- 
time PCR conditions were 50°C/2 min (for optimal 
AmpErase uracil-N-glycosylase [UNG] activity), 
95°C/10 min followed by (95°C/15 s and 60°C/ 
1 min) x 40 cycles. The ABI7900 system software 
raised the threshold cycle (C T ) values representing 
the cycle numbers needed to reveal the minimal 
amount of amplified material of a target transcript. 
The relative levels of the total Arg transcript in each 
sample were calculated with the averaged Cr values 
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of each sample [37]. Briefly, the averaged Cr value of 
the GAPDH transcript was subtracted from the 
averaged Op value of the total Arg transcript of a 
specific cell type in order to obtain the Arg ACr value. 
The difference (AACp) between the Arg ACr values in 
a specific cell type and the Arg ACr value of the LP1 
cell line used as a calibrator was determined and 
expressed as 2~ MC \ and represented the fold of Arg 
expression in relationship to the calibrator. The LP1 
cell line was chosen as calibrator from the beginning 
of the study due to the more mature phenotype 
among the lymphoid cell lines studied [29]. The 
relative amount of Arg isoforms was calculated as 
2 _ACt [37]. A AC't was obtained by subtracting the 
averaged Cy value of total Arg from that of the target 
isoform, and was then transformed into 2~ ac t. This 
value represents the amount of a single isoform with 
respect to the total quantity of Arg transcript and is 
expressed as a percentage. The amplification effi- 
ciencies for total Arg, each Arg isoform, and the 
GAPDH transcripts were determined according to 
the validation experiments suggested by Applied 
Biosystems (User Bulletin No. 2) and were ap- 
proximately equal. In order to confirm the speci- 
ficity of the PCR reaction, the products of the 
real-time PCR were electrophoresed on a 1.2% 
agarose gel. 

Western Blotting 

The cells were lysed with 1% Triton X-100, 10 mM 
Tris-HCl PH 7.4, 150 mM NaCl, and the Protease 
Inhibitor Cocktail (Roche, Mannheim, Germany) as 
recommended by the manufacturer. The protein 
concentration was determined by means of a Bio-Rad 
microassay (Hercules, CA). The lysates (80 jig) 
separated in 7.5% polyacrylamide gel electrophor- 
esis were blotted onto nitrocellulose membranes, 
and stained with Ponceu S in order to show equal 
lane loading. Western blotting was performed with 
rabbit polyclonal anti-Arg antibodies [17] directed 
against the SH2 and SH3 domains (a kind gift of 
A. Koleske, Yale University, CT). Anti-actin anti- 
bodies (Sigma-Aldrich) were used to detect 0-actin 
protein. The detection was performed with secondary 
antibodies coupled to horseradish peroxidase and 
a SuperSignal Detection System (Pierce, Rockford, 
IL). 

RESULTS 

RT-PCR Qualitative Analysis of Arg Transcripts 

The RNA extracted from several cell lines of 
hematopoietic, epithelial, nervous, and connective 
origin was analyzed by RT-PCR with two different 
sets of primers: 41N/42N and 41N/132N. The 41N 
reverse primer was located on the common exon IV 
of Arg, the 42N and 132N sense primers were, respec- 
tively, located on exons 1A and IB. Both sets of 
primers 41N/42N and 41N/132N amplified two 
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\ bands of different sizes (respectively 466bp/403bp 
and 523bp/460bp), demonstrating that the ABL2(ARG) 
gene is normally expressed in the cells as four 
different 5'-end transcript isoforms, here called 1A 
long and short (1AL, IAS) and IB long and short 
(1BL, IBS) (Figure 2). This was also confirmed with 
primers spanning different exons and specific for the 
individual isoform (not shown). These PCR products 
were all sequenced. The nucleotide sequence showed 
that 63 bp, which code for 21 amino acids are alter- 
natively juxtaposed to the IB and 1A first exon. The 
alternative splicing of the sequence maintain the 
open reading frame (Figure 2). On the basis of 
the intron-exon junction of Arg (derived from the 
NCBI Genome database), the 63 bp sequence was 
flanked by the consensus sequences of the acceptor 
(AG) and donor (GT) splice sites. Only the IAS and 
1BL forms were described during the first cloning of 
Arg cDNA [3] but, in the t(l;12) translocation present 
in leukemic patients, an identical 63 bp sequence 
had been found alternatively fused to the Arg com- 
mon exons in the rearranged transcript [24-26]. 
However, the lack of 1A and IB first exons in these 
rearranged transcripts made it impossible to identify 
whether this additional splicing event involved both 
the A and B forms of Arg. 

On the basis of the information [17] that the 
mouse brain Arg sequence specifically excludes an 
exon encoding amino acids 688 (G) to 791 (S), we 
used PCR to test the same region of human Arg 
cDNA. Amplification of the cDNA obtained from the 
A172 glioblastoma cell line with the 112N/113N 
primer pair revealed two specific bands of 475 and 
166 bp (Figure 2). The presence of different 3'-end 
forms in cellular RNA samples derived from different 
cell types was also demonstrated with the 112N/ 
114N and 112N/115N primer pairs that amplified 
specific bands of 142 and 207 bp, respectively (Figure 2). 
The isoforms were called C-termini long (CTL) and 
short (CTS). The sequence revealed that the shorter 
band was the result of the in-frame loss of a 309 bp 
fragment encoding amino acids 688 (N) to 790 (G) in 
the C-termini. The lack of these 103 amino acids 
affects about half of the F-actin-binding domain 
[1,15] closest to the Arg kinase domain (Figure 1). 
The amplification of genomic DNA with the 112N/ 
1 15N primer pair revealed a single band of 207 bp (as 
expected from the cDNA sequence), the 112N/114N 
primers did not reveal any amplified band (not 
shown). Primer 1 14N spanned the cDNA sequences 
that were juxtaposed after the loss of the 309 bp 
fragment which, in Arg cDNA [3], was delimited by 
the GGG sequence at both extremities. This GGG 
sequence also flanks the 5'-end of the IB exon. On 
the basis of these data and the NCBI Genome Map 
Viewer Homo sapiens database, Build 34, Version 1, 
we derived the schema of the intron-exon order 
and the predicted putative protein isoforms of Arg 
(Figure 1). 



Real-Time PCR Quantitative Analysis of Arg Transcripts 

Total Arg transcripts 

Real-time PCR with O/P primer pairs and the y 
probe complementary to common exons (Table 2) 
confirmed that the total Arg transcript was more 
abundant in mature (LP1, Raji, Jurkat) than in 
immature (ALLP0, Molt-4) cells of the lymphoid 
leukemic cell lines (Figure 3), as we previously 
showed with competitive PCR [29]. We also con- 
firmed semiquantitative PCR results showing in- 
creased Arg transcript expression in the HL-60 
myeloid leukemia cell line which differentiated 
toward granulocytes or macrophage-like cells [28]. 
Among the ceil lines tested, Arg transcript expression 
was highest in the A172 glioblastoma cells, in which 
c-Abl protein was not expressed because of the loss 
of functionally active germline ABL alleles [38]. 



A 10 

9 

i r 

7 6 

as 

I 3 

2 



nfLn 



mil 



A 



Ml 



5 SIM 88s$ if? 



9 

_ 8 

U 

»5 
?4 

I 3 

2 

1 
0 



a o 

ii 



<cdoo <oo<caiijuju.u.o O x x =f -J 

8 " ~ s * 



rg$33iiiiii 

lllililll ZKZ " 

5 jj J a o 




Figure 3. Relative levels of the total Arg transcripts in different 
human cell types as evaluated by Real-time PCR. The values 
expressed as 2~ AAC T 137] represent the fold of Arg expression in 
each cell type respect to the LP-1 calibrator cell tine, considered as 
having a value of 1. (A) Cell lines. Mean values of two independent 
experiments performed in triplicate; the vertical bars indicate the 
range of variability. (B) Ceils and tissues from single individuals 
denoted by the letters A through H. N, normal; T f tumor; Ml. MS. 
acute myelogenous leukemia FAB types. 
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The mature lymphocytes, monocytes, and granulo- 
cytes from the single normal donors had not only a 
higher Arg transcript expression than the tumor cell 
lines of the lymphocytic (LP1, Raji, ALLPO, Molt-4, 
Jurkat), monocytic (U937), and granulocytic (HL-60, 
GFD8) lineage, but also than the blasts of myelo- 
cytic (Ml) and monocytic (MS) spontaneous acute 
leukemia. A higher expression of Arg had also been 
observed in normal colon mucosa and in one renal 
cortex than in the respective carcinomas. On the 
whole, it seemed that in tumor cells, there was a 
downexpression of Arg with respect to the normal 
cells. 

5'-End isoforms 

The differential quantification of each S'-end 
isoform with specific sets of primers and probes 
(Table 2) demonstrated that the four Arg isoforms 
were present in different amounts (Figure 4A and B) 
and had a characteristic expression pattern in the 
diverse cell types. In hematopoietic cell lines, in 
Lan-5 neuroblastoma cell line, in lymphocytes, 
monocytes, and granulocytes, in the Ml and M5 
leukemic blasts/ the prevailing form was the 1BL 
followed by its short counterpart (IBS). In these cells, 
the 1AL form might be quantifiable, while the IAS 
form (the only 1A form so far described in literature 
[3]) was in general the least represented; The most 
expressed form in Al 72 glioblastoma and Hel-299 
fibroblastic cell lines was 1AL followed by IAS, 
although the B forms were quantifiable. In Caki-1 
and Hela epithelial cell lines, the 1AL form was the 
most abundant, but all the other forms (IAS, 1BL and 
IBS) were consistently represented accounting for 
15%-25% of the total Arg level. A relative distribu- 
tion of the forms in favor of 1A was found in the two 
normal renal tissues, this pattern changed with a 
redistribution of the forms and the prevalence of 
1BL in the two renal carcinoma. The forms more 
represented in the two normal colon mucosa were 
1BL and IBS, this also being true for the tumoral 
counterpart. 

3'-End isoforms 

Differential quantification of the 3'-end isoforms 
showed that all the tested cells contained both 
CTS and CTL (Figure 4C and D). CTS was prevalent 
in B lymphoid cell lines, but there was a decrease 
in CTS with a concurrent increase in CTL in the 
neoplastic LP1 (plasmacells), Raji (mature B cells), 
and A11PO (early pre B cells) cell lines that reflected 
the difference in maturation. CTS was also greater 
than CTL in K562 myeloid and Hel-299 fibroblast 
cells. In the Caki-1 and Hela epithelial cell lines, 
CTS was more abundant, but the level of CTL was 
approximately similar. CTL was prevalent in Jurkat 
and Molt-4 T cell lines, in U937 monocytic cell lines, 
in A172 glioblastoma, and Lan-5 neuroblastoma 
cell lines. All donor lymphocytes (mainly T cells), 
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monocytes, and granulocytes had a prevalence of the 
CTL form. The two forms were equally expressed in 
one case of normal colon mucosa, whereas CTL was 
slightly predominant in the other. In both cases of 
colon carcinoma, CTS was greater than CTL. The 
chief form in the two normal renal cortexes was 
represented by CTS. This pattern was unchanged in 
one of the renal carcinoma, but was inverted in the 
other. 

Real-Time PCR Quantitative Analysis of the Arg Transcript 
Isoforms in Treated Cells 

The HL-60 cells differentiated to granulocytes with 
1 uM ATRA and to macrophage-like cells with 10 nM 
TPA. The ATRA-treated HL-60 cells stopped growing 
at d 4 (Figure 5A) showing granulocytic phenotype 
and morphology [28]. At d 2, the TPA-treated HL-60 
cells stopped growing (Figure 5A) and about 60% 
were adherent to the flask, and were fully viable 
macrophage-like cells [28]. In the granulocytic dif- 
ferentiation of ATRA-treated HL-60 cells, the expres- 
sion pattern of the 5'-end isoforms did not change 
significantly, but a prevalence of the 3'-end CTL form 
(Figure SB) as in the granulocytes of volunteer donors 
(Figure 4B) was observed. In the macrophage- 
like differentiation of TPA-treated HL-60 cells, the 
expression profile of the 5'-end isoforms changed 
dramatically, with a significant increase in the 1A 
forms and particularly of the 1 AL. The 3'-end forms 
showed redistribution in percentage of CTS and CTL, 
but the increase in CTL was insufficient to make it 
more abundant than CTS (Figure 5B). 

Given that in HL-60 cells, the differentiation is 
associated to growth arrest we also investigated the 
GFD8 cell line in which proliferation blocking could 
be dissociated from differentiation. The GFD8 cells 
share properties with early myeloid progenitor cells 
and are GM-CSF dependent for growth. The presence 
of GM-CSF does not change the cellular phenotype 
[34]. Removal of growth factor for 4 d leads to 
reversible growth arrest (Figure 5 A) in the absence of 
differentiation and without a significant loss of 
viability [35]. GM-CSF deprivation led to a change 
in Arg expression at d 4, with the 1A, especially 1 AL, 
forms becoming the most abundant. The 3'-end 
forms had an increase in the relative difference 
between CTS and CTL, with CTL remaining pre- 
ponderant (Figure SB). 

Arg Protein Isoforms Evaluated by Western Blotting 

Western blotting analysis of different cell lysates 
with anti-Arg antibodies revealed a set of bands as 
previously described [17,29]. The absence or pre- 
sence of the 21 amino acids of exon II was not suf- 
ficient to reveal the different N-terminai isoforms 
by means of electrophoresis mobility on one- 
dimension polyacryiamide gel. Thus, the differently 
sized bands detected by anti-Arg antibodies probably 
reflected the different sizes of the Arg coding region 
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figure 4. Relative levels of the different isoforms of the Arg transcripts in different human cell types. The values 
calculated as 2~ AC ' T [37] are reported as the percentage of the individual isoforms with respect to the total Arg 
transcripts. (A, 8) 5'-end isoforms 0= IAS; DIAL; Q1 BU ■ IBS). (C, D) 3'-end isoforms (■CTS; DOU (A, O Cell 
lines. Mean values of two independent experiments performed in triplicate; the vertical bars indicate the range of 
variability. (B, D) Cells and tissues from single individuals denoted by the letters A through H. N, normal; T, tumor; 
M1 . M5, acute myelogenous leukemia FAB types. 
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Figure 5. Relative levels of the different isoforms of Arg transcripts 
in treated celt lines. (A) Growth rate of HL-60 cells, untreated (O) 
and treated (•) with ATRA or TPA, and GFD8 ceils cultivated in 
the presence (O) or absence (•) of GM-C5F. Exponentially growing 
cells were plated in 50 mL of medium at a density of 3 x lOVmL 
The total cell number was determined in a Thoma chamber. In the 
case of the TFA-treated cells, the data refer to the adherent ceils 
detached from the flask after incubation at 37°C with trypsin, and 
the counted cells are expressed as the number of cells/ml of the initial 
50 mL volume. Mean values ±SDof three independent experiments. 
(B) HL-60 cells untreated and treated with IjiM.ATRA for 4 d or 
1 0 nM TPA for 2 d. GFD8 celts cultivated in the presence or absence 
of 5 no/ml GfvKSF for 4 d. Top: 5'-end isoforms (s1AS; DIAL; 
Q1BU ■ 1 BS). Bottom: 3'-end isoforms (BCTS; DCTL). The values 
calculated as 2~^ Ct 137] are reported as the percentage of single 
isoforms with respect to the total Arg transcripts. Mean values of two 
independent experiments performed in triplicate; the vertical bars 
indicate the range of variability. 



caused by the loss of 103 aminoacids in the C-termini 
(Figure 6). The protein bands revealed in the different 
ceil types also had a different reciprocal intensity, 
with a general agreement between band intensity 
and the reciprocal amount of CTL and CTS tran- 
scripts in the different cell lines as quantified by 
real-time PCR. The shorter protein was greater in 
K562 cells, and the longer protein in Molt-4 and 
Lan-5 cells, whereas both proteins were roughly 
equivalent in the Caki-1 and Hela cells. Instead, in 
the LP-1 cells the most abundant band was that 
corresponding to the longer protein, which dis- 




figure 6. Western blot analysis of 80 |tg of cell rysate from 
different cell lines separated on 7.5% potyacrytamide get electro- 
phoresis with polyclonal anti-Arg and anti-p-actin antibodies. The 
different panels are different blots and the exposure time ranged 
from 30 to 60 s. 

agreed with the transcript data. Finally, it was worth 
noting that the longer protein seen in A172 cells ran 
faster than those seen in the Lan-5 and in the other 
cells. Moreover, the total amount of Arg proteins did 
not correlate with the Arg mRNA in the A172 cells 
(Figure 3). 

DISCUSSION 

During our PCR analysis of the 5'-end of human 
Arg cDNA, we found two differently sized amplified 
fragments (long and short) for both 1A and IB forms. 
These 1AL, IAS, 1BL, and IBS cDNAs diverged in 
positions that were unique for the long and short 
forms, thus suggesting that they arose as a result of 
the combination of two alternative splicing events, 
one involving the 1A and IB exons, and the other a 
second exon. The fact that a 63 bp fragment (which is 
flanked by a splice acceptor and donor site in the 
genome) can be alternatively juxtaposed to exons 1 A 
and IB of Arg suggests that this sequence is the real 
exon II. In human Arg, the 1 A exon therefore ends at 
amino acid 37 (T) and the IB exon at amino acid 
52 (H), both of which can or cannot be followed by 
the 21 amino acids of exon II. The long. Arg forms 
(1AL or 1BL) containing exon II are always the most 
abundant in the cells studied, which suggests that 
this exon plays an important role in the function of 
both the A and B forms of Arg proteins, although the 
exact function of the 21 amino acid sequence has 
not yet been established. Our findings were also 
supported by the fact that the translocation involv- 
ing the ABL2(ARG) gene in leukemia produces the 
ETV6/ARG fused proteins containing (long form) 
or lacking (short form) exon II [4-6]. It has been 
shown that the presence of exon II in the ETV6/ARG 
long protein leads to more pronounced oncogenic 
activity [39] in comparison with the short form. 
The presence of this Arg exon II may therefore be 
important and also reflect the situation normally 
existing in the fusion proteins of c-Abl (ETV6/ABL; 
BCR/ABL), in which the fusion point comes soon 
after the end of the first exon of c-Abi, but whose 
second exon does not have any analogy with this 
second exon of Arg. Furthermore, the N-terminal 
domains of Arg and c-Abl seem to play an important 
role in regulating their catalytic activity [40,41], 
and therefore any study of the kinase activity of Arg 
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\ should consider the unique presence of Arg exon II, 
keeping it distinct from exons 1A and IB. 

Our real-time PCR data showed that the 5'-end 
short and long forms of Arg mRNA are consistently 
represented in cells, and four different N-terminal 
proteins can be predicted from the Arg cDNA 
sequence even though the difference in N-terminal 
amino acids is too small to allow their separation by 
means of one-dimensionai gel electrophoresis. The 
5'-end isoforms have an expression pattern which 
differs in the diverse cell types and, in general, it 
seems that there is a specific prevalence of 1AL and 
IAS or 1BL and IBS isoforms in the various cells: 
this may suggest, for example, that the IB first exon 
has a specific role in the hematopoietic cells in which 
the 1BL and IBS isoforms prevail. Moreover, the 
specific function of the B forms may be differently 
modulated by the distribution of the long and short 
forms, which having distinct N-termini may differ- 
entially regulate catalytic activity [40,41]. The same 
can be said for the cells in which the A isoforms 
predominate. 

All of the isoforms may therefore have an as yet 
unknown functional role, a hypothesis that is also 
supported by the changes observed during HL-60 cell 
differentiation and growth arrest of GFD8 cells, and 
by the fact that in addition to a prevalent form, the 
others are similarly abundant in some cell types 
(Caki-1, Hela). The functional impact of the various 
isoforms can of course be different, depending on 
their relative abundance. It is worth noting that four 
mRNA isoforms of mouse c- Abl have been described 
that diverge at the first exon [12]: the type I and IV 
isoforms are predominant and translated, and it has 
been suggested that these two proteins play different 
roles, with type I being involved in LPS-induced 
lymphoid differentiation and type IV in apoptosis 
[42]. 

The PCR-revealed 309 bp deletion in the last 
exon of Arg causes the loss of 103 amino acids from 
the C-termini that affect the actin-binding domain 
closest to the kinase domain. The two major protein 
bands revealed by Western blotting with anti-Arg 
antibodies may represent the two translated isoforms 
which, on the basis of their amino acid composition, 
differ by about 10 kDa. The expression pattern of the 
3'-end transcript isoforms also varies in the different 
cell types, and their reciprocal ratio is maintained 
in the probably translated proteins. The presence of 
different Arg protein isoforms may have various 
implications. The C-terminal domains of Arg with 
diverse actin-binding domains might lead to differ- 
ent interactions with the actin cytoskeletal structure, 
and to distinct localizations/concentrations of the 
N-terminal isoforms that might be involved in the 
activation of different metabolic pathways [40], A 
down regulation of Arg expression was observed 
in the tumors studied as reported in other tumors 
[7-10]. The Arg isoforms might also play a role in 



neoplasms in which an altered Arg expression can be 
accompanied by variation in the expression pattern 
of specific Arg isoforms, as we noted in a few tumor 
cases. Although these data need to be confirmed in a 
greater number of cases, they open the possibility 
that any variation of the expression pattern of Arg 
isoforms might have different and specific effects 
on morphology and motility or on other specific 
biological events in tumor cells. 
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Abstract 

Mutations in mouse and human patched (PTCH) genes are associated with birth defects and cancer. PTCH, a 12-pass transmembrane 
protein, is a receptor for Sonic hedgehog (Shh) signaling proteins. Shh proteins activate transcription of target genes, including PTCH, via 
GLI transcription factors. Here we identified seven and five isoforms of human and mouse PTCH mRNA, respectively, which are generated 
by the complex alternative use of five exons as the first exon (exons la to le in the 5'-to-3' order). Although expression profiles of these 
isoforms were highly variable among human tissues, three of them, PTCHa, PTCHb, and PTCHd, were predominantly expressed in most 
tissues, PTCHd being most ubiquitous. In contrast, PTCHb was always predominant and reached a maximum at £10.5 during mouse 
development These three mRNA isoforms encode three PTCH proteins with distinct N-tennini, PTCHl, PTCHm, and PTCH S - The 
expression of these three isoforms was regulated by GLI transcription factors, and at least two functional GLI-binding sequences were 
identified, one in exon la and the other between exon la and exon lb. PTCHl and PTCHm were equally active in terms of suppressing GLI- 
mediated transcription and inducing apoptosis. PTCH S protein (encoded by PTCHd), lacking the first transmembrane domain, was more 
unstable than the other two, resulting in a reduced activity. This study may shed light on the mechanism whereby a single PTCH gene plays a 
role in both tumor cell growth and embryonic development 
© 2004 Elsevier Inc. All rights reserved. 

Keywords: Patched; Sonic hedgehog; Basal cell carcinoma; Medulloblastoma; Alternative splicing 



The Sonic hedgehog (Shh) signaling cascade is pivotal to 
embryonic development, because holoprosencephaly (HPE), 
characterized by a failure of the forebrain to separate 
completely into hemispheres, and HPE-like abnormalities 
are associated with a loss of Shh function in humans and in 
mice [1-3]. The role of the Shh pathway in tumorigenesis 
was also established with the discovery that inactivating 
mutations in the Patched (PTCH) gene, which encodes one 
component of the Shh receptor, are responsible for the 
inherited cancer predisposition disorder known as Goriin's 
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or nevoid basal cell carcinoma syndrome (NBCCS) [4,5], as 
well as sporadic basal cell carcinomas (BCCs) and 
medulloblastomas [6-8]. NBCCS is an autosomal dominant 
neurocutaneous disorder characterized by developmental 
abnormalities such as palmar and plantar pits, jaw cysts, 
calcification of the faix cerebri, and skeletal anomalies and 
also by a predisposition to cancers such as BCC and me- 
dulloblastoma [9]. Familial and sporadic BCCs display toss 
of heterozygosity in this region, consistent with PTCH 
being a tumor suppressor gene [6,10]. In addition, activating 
mutations in Smoothened {Smo\ also encoding another 
component of the Shh receptor, have been detected in BCCs 
[11], further emphasizing die importance of mis pathway in 
tumor development More importantly, the recent finding 
that this pathway is essential for growth of a wide range of 
tumor types not associated with NBCCS, such as lung 



0888-7543/$ - sec front matter © 2004 Elsevier Inc. AH rights reserved, 
doi: 1 0. 1 0 1 67j.ygeno.2004. 11.014 



K. Nagao et al / Genomics 85 (2005) 462-471 



463 



cancers or digestive tract tumors, sheds light on potential 
new diagnostic and therapeutic approaches [12-14]. 

PTCH, a 12-pass transmembrane protein, is the ligand- 
, binding component of the Shh receptor complex. In the 
absence of Shh binding, PTCH is thought to hold Smo, a 7- 
pass transmembrane protein, in an inactive state and thus 
inhibit signaling to downstream genes. Upon the binding of 
Shh, the inhibition of Smo is released and signaling is 
transduced, leading to the activation of target genes by the 
Gli family of transcription factors [15]. The transcription of 
PTCH itself is induced by ^hh pathway activity [16], thus 
generating a negative feedback loop, which may play an 
important role in tumor suppression by inhibiting a sus- 
tained activation of the pathway. 

Hahn et al. predicted that there are three different forms 
of the PTCH protein present in humans: the ancestral form 
and two human-specific forms [4]. Recently, a detailed 
characterization of three alternative first exons was reported 
[17], However, our study using the 5' rapid amplification of 
cDNA ends (5HACE) technique revealed the existence of 
an additional first exon and unexpectedly complex splicing 
between die first and the second exons that is evolutionarily 
conserved across species. Therefore, the characterization of 
several potential forms of the PTCH protein may reveal the 
mechanism whereby a single PTCH gene could play a role 
in different pathways, and the determination of the 
regulation of different splice forms of PTCH mRNA may 
shed light on the apparent role of the gene in tumor cell 
growth as .well as embryonic development. Here 



we 



characterize multiple isoforms of PTCH in humans and 
mice and discuss the functions of their products, expression 
profiles, and transcriptional regulation. 



Results 

Isolation of isoforms of human and mouse PTCH 

PTCH is a multiexon gene comprising 23 exons dis- 
tributed over a region of -70 kb. To date, three cDNA 
sequences encoding the human PTCH gene's first exon 
have been reported and named exons 1, 1A, and IB [17], 
and another exon has recently been deposited with GenBank 
(exon la described below, GenBank Accession No. 
BC043542). In contrast, only a single mRNA species of 
PTCH has been reported in mice [18] (GenBank Accession 
No. U46155). Due to the use of alternative exons, several / 
mRNA isoforms are generated. On the basis of this ( 
background we performed a comprehensive analysis of the 
5' structure of mRNA species derived from the human^ 
PTCH gene employing the 5HACE technique. Sequencing 
of 31 RACE clones revealed an additional alternative fiist 
exon (exon lc described below, submitted to GenBank as 
Accession No. AB 189438) and complex splicing between 
the first and the second exon. Using a genomic sequence 
containing the PTCH gene (GenBank Accession No. 
AL161729), the precise genomic organization of the human 
PTCH gene was determined as shown in Fig. 1 . For the sake 
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Fig. I. Identification of human and mouse PTCH isoforms. (A) Comparison of human and mouse exon-intron boundaries. Upper- and lowercase letters 
indicate exon and intron sequences, respectively. Nucleotides not conserved between the two species are underlined Alternative splice donor sites are indicated 
by arrowheads. (B) 5' region of human and mouse PTCH gene structure. The 5' ends of the mouse first exons have not been determined. (Q 5' structure of 
PTCH isoforms. The positions of the first methionine codons and in-frame stop codons are indicated by arrows and asterisks, respectively. In four of seven 
mRNAs, in-frame stop codons were identified. The first in-frame methionine codon could be determined in the other three transcripts since the 5HACE system 
we employed amplifies only full-length transcripts [47], (D) PTCH protein isoforms encoded by mRNA species described in (C). Numbers refer to amino acid 
positions relative to the first methionine of PTCH L . The positions of the 1 2 transmembrane regions are indicated by filled boxes. PTCH L . has 65 unique amino 
acid residues at the N-tenninus depicted with a shaded box. 
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of simplicity, we named the first exons exon la to le on the 
basis of their 5'-to-3' order. Thus, exons lb, Id, and le are 
the former exons IB, 1, and 1 A, respectively. In addition to 
multiple first exons, we found that alternative 5' splice sites 
allow the shortening of exons la and 1c, generating exons 
la* and lef (Fig. 1C). The complex alternative splicing 
described above thus generates up to seven mRNA species, 
each with its own distinct 5' sequence (Figs. IB and IC). 
RT-PCR using isoform-specific forward primers for each 
alternative exon 1 and a common reverse primer for exon 2 
indeed validated the existence of the seven different 
mRNAs. These mRNA isoforms encode four PTCH 
proteins termed PTCH U PTCH L >, PTCH M ,_ and PTCH S , 
(Figs. IC and ID). PTCH S is an N-terminally truncated 
PTCH protein that lacks the first transmembrane domain 
(Fig. ID). Although only a single species of PTCH mRNA 
has been reported in mice, a comparison of the human 
PTCH genomic sequence with the mouse sequence (NCBI 
Locus NT_039587) suggested the existence of multiple first 
exons. In this study, mouse and human Patched genes are 
collectively referred to by the human nomenclature (PTCH, 
whereas mouse Patched is often called P/c). RT-PCR using 
the forward primers constructed at mouse putative first 
exons and reverse primers at exon 2 demonstrated that most 
of the PTCH isoforms found in humans are indeed 
conserved in mice. At least in mouse P19 cells and several 
mouse tissues from which total RNA was extracted, PTCHa! 
and PTCHc have not been identified and the splice donor 
site at exon le was different from that of humans (Fig. 1A). 
All exons were flanked by splice junctions that conformed 
to the consensus GT-rule except for exon la'-exon 2 in 
humans, in which the GC-AG intron was observed. GC- 
AG introns are occasionally found and processed by the 
same splicing pathway as conventional GT-AG introns 

Expression pfvfiles of three isoforms of PTCH in various 
tissues 

Selective usage of the 5'-most exons suggests a complex 
tissue-specific transcriptional regulation. Therefore, to 
investigate the expression profiles of PTCH isoforms, RT- 
PCR was performed with isoform-specific primers for the 
first alternative exons using total RNA from a panel of 
human tissues, and profiles were analyzed with an Agilent 
2100 bioanalyzer. As shown in Fig. 2A, PTCH was 
expressed in a wide range of human tissues. However, the 
levels of total PTCH RNA varied among human tissues. For 
example, the heart and liver showed low levels of 
expression, which is largely consistent with previous reports 
on human and mouse PTCH expression [1 8,20]. Expression 
profiles of the PTCH isoforms were also highly variable 
among tissues. While PTCHd (encoding PTCH S ) was 
widely expressed, the expression of PTCHa (encoding 
PTCH M ) and PTCHb (encoding PTCH L ) was relatively 
restricted. For example, PTCHb was expressed in all the 
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Fig. 2. Expression profiling of PTCH isoforms. (A) RT-PCR analysis of 
expression profiles in various tissues. Total RNA obtained from a panel of 
human tissues was subjected to RT-PCR. Forward primers specific to each 
of the first exons and a common reverse primer for exon 2 were synthesized 
and used for PCR The RT-PCR products were quantified with an Agilent 
2100 bioanalyzer. PTCH expression levels were normalized to those of 
GAPDH. Exons with relative expression levels lower than 0.007 do not 
appear in the graph. (B) RT-PCR analysis of expression profiles in various 
mouse developmental stages. Total RNA obtained from mouse embryos at 
various developmental stages was subjected to RT-PCR using mouse- 
specific primers. Mean PTCH expression levels normalized to p-actin 
expression are presented at die bottom (n ** 2-4). 

analyzed tissues apart from liver, while the expression of 
PTCHa was more restricted, showing virtually no expres- 
sion in the heart, thymus, liver, and trachea. The other 
PTCH isoforms using exons la\ lc, lc*, and le were found 
to be expressed at very low levels if at all throughout the 
tissues. Therefore, we focused on PTCHa, PTCHb, and 
PTCHd in further experiments. Since Shh signaling plays a 
key role in embryonic development, we next investigated 
the expression profile in mouse embryogenesis. Consistent 
with a previous report, the expression of PTCH reached a 
maximum at £10.5, at which point the limb buds become 
increasingly prominent, and declined thereafter [18]. Nota- 
bly, in contrast to human adult tissues, the expression of 
PTCHb was always prevalent during embryonic develop- 
ment (Fig. 2B). 

Transcriptional regulation of PTCH isoforms by GLI 

Tt is well known that PTCH itself is one of the target 
genes in the Shh signaling network creating a negative 
feedback loop and a balance via the antagonism of Shh and 
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PTCH. Even though the GLI proteins may well not be the 
only mediators of Shh signaling, the overwhelming majority 
of available data on insects and vertebrates indicates a 
central role for GLI proteins in regulating the mediation and 
interpretation of Shh signals. As shown in Fig. 3, the 
expression of all three PTCH isoforms was elevated by 
GUI in the cell lines we employed. However, a closer 
observation revealed slight differences in the degree of in- 
duction. For example, PTCHd and PTCHb were more 
strongly upregulated by GUI in 293T and HSC-2 cells, 
whereas the induction of PTCHa was more evident than that 
of PTCHb or PTCHd iri Ho-l-u-1 and LK-2 cells, 
indicating cell type-specific regulation of the isoforms. 

PTCH promoter has Junctional GU-binding sites 

The Drosophila patched gene (ptc) has a cluster of three 
GU consensus binding sites (5'-TGGGTGGTC-3' or 5'-GA- 
CCACCCA-30 [21] in the promoter region that is required 
for the reporter gene expression in response to Hedgehog 
(Hh) activity [22]. Recently, it was reported that the 
transcriptional regulation of PTCH by Shh signaling was 
mediated by a single GU-binding site located -400 bp 
upstream of exon lb (GU-BS1 in Fig. 4A) [23]. However, 
sequencing farther upstream indicated the presence of even 
two more consensus GU-binding sequences not reported 
previously (GU-BS2 and GU-BS3 in Fig. 4A, -3965 and 
—8283 bp relative to the reported transcription start site of 
exon lb, respectively). The mouse upstream sequence also 
contained three putative consensus GLI-binding sites and 
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Fig. 3. Transcriptional regulation of three PTCH isoforms. Cell lines 
indicated at the top were transfccted with the expression plasmid pSRa- 
Flag-Giil or pSRa-Flag-Glifzfd. pSRa-FIag-Glilzfd is a plasmid for a 
mutant GUI lacking the zinc finger domain [43] used as a negative control 
for pSRa-Flag-Glil. Cells were cultured in 0.5% FCS for 16 h after me 
transfecb'on and total RNA was extracted from the transfected cells and 
subjected to RT-PCR. Forward and reverse primers were constructed for the 
exons indicated in parentheses. PTCH (6-7) indicates the overall PTCH 
expression because exons 6 and 7 arc used regardless of the isoform. The 
expression of Flag-tagged GLI I proteins was confirmed by immunoblotting 
using anti-Flag antibody (Anti-Flag). 



the sequences around these sites were strikingly conserved 
(Fig. 4B). This suggests that two upstream consensus GU- 
binding sequences, as well as a reported one, act as GLI- 
responsive elements. To test this assumption, genome frag- 
ments containing GLI-BS1, GU-BS2, and GU-BS3 were 
inserted into a luciferase construct (pGV-PTCHl, pGV- 
PTCH2, and pGV-PTCH3, respectively). Cotransfectian of 
the GLI1 expression plasmid with pGV-PTCHl enhanced 
the luciferase activity in SH-SY5Y cells (Fig. 4C), confirm- 
ing a previous report. In addition, as anticipated, GUI 
expression also enhanced the luciferase activity when 
cotransfected with reporter constructs containing upstream 
GLI-binding sequences (pGV-PTCH2 and pGV-PTCH3). 
To confirm that these sites are really responsible for the 
GU-mediated activation, a mutation with four nucleotide 
substitutions was introduced into GLI-binding sequences 
(5'-T^Gr<7G,4TC-3' or S'-GATCCACTK-V, mutated 
nucleotides in italic), generating the constructs pGV- 
PTCHlmt, pGV-PTCH2mt, and pGV-PTCH3mt The intro- 
duction of these mutations into the putative GU-binding 
sites indeed abolished the elevation of luciferase activity 
induced by GUI. Furthermore, the 1.1 -kb mouse fragment 
containing GU-BS1 showed a similar response to GUI 
expression (pGV-mPTCH) (Fig. 4Q, suggesting that the 
mechanism by which PTCH expression is regulated by the 
Shh signaling pathway is conserved. 

We also examined whether GU protein could physically 
associate with putative GU-binding elements in PTCH in 
vitro and in vivo. First, we tested these sites in an 
electrophoretic mobility shift assay. As shown in Fig. 4D, 
when GST-GU3 fusion protein was incubated with a wild- 
type DNA probe containing a putative GU consensus 
sequence in the promoter region, a complex with a shift in 
gel mobility was detected (lane 3). In contrast, substitution 
of GST nonfusion for GST-GU3, or mutant DNA probe 
with die same nucleotide substitutions as described above 
for the wild-type sequence, resulted in a failure to detect a 
complex whose mobility was altered in these assays (lanes 2 
and 6). Moreover, the DNA-protein complex was abolished 
by competition with an unlabeled oligonucleotide contain- 
ing the GU site, but not by a mutated oligonucleotide, 
demonstrating the specificity of the complex formation 
(lanes 4 and 5). GST-GU3 also bound specifically to two 
more upstream sequences with a GU-binding consensus 
sequence (lanes 9 and 15) in vitro. 

To determine whether the GU protein occupies these 
sites in vivo, we used a chromatin imrnunoprecipitation 
(ChIP) assay to analyze lysates extracted from 293T cells 
transfected with a plasmid to express Flag-GLIl. The 
genomic fragments including GLI-BS1 and GLI-BS3 were 
specifically precipitated as a GLI-DNA complex with an 
anti-Flag antibody (Fig. 4E, lanes 3 and 11), while GLI-BS2 
was barely coimmunoprecipitated (lane 7). As controls, the 
same fragments were not precipitated when cells were 
transfected with a construct for Flag tag or the lysates were 
incubated with an anti-Myc antibody (lanes 2, 4, 10, and 
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Fig. 4. Transcriptional regulation of PTCH isoforms. (A) Comparison of human and mouse genomic structures. Black boxes indicate locations and relative 
sizes of exons. Asterisks indicate Che positions of three putative GLI-binding sequences (5-TGGGTGGTC-3' or y-GACCACCCA-S' % DNA fragments 
inserted into hiciferase vectors to make the reporter gene constructs are indicated by arrows. The names of the resulting constructs are indicated below. (B) 
Comparison of human (top) and mouse (bottom) GLI-binding sequences. Consensus GLI-binding sequences are underlined Lowercase letters indicate 
nucleotides not conserved between the two species. (C) The PTCH promoter is GLI responsive. SH-SY5Y cells were cotransfected with various reporter gene 
constructs as indicated with or without pSRa-Flag-GUI. Cells were cultured in 0.5% FCS for 16 h after the transfection and then harvested for the hiciferase 
assay. Firefly hiciferase activity was normalized by Renilla hiciferase activity from a cotransfected pRL-SV40 and is indicated relative to the activity of Ac 
same reporter without pSRa-Flag-GUI . The total amount of transfected DNA was adjusted using pcDNA3.0. Data are representative of three experiments with 
similar results. (D) GLI protein can bind in vitro to an oligonucleotide probe representing the PTCH gene region. Recombinant GST or GST-GLI3 protein was 
incubated with 32 P-labeled oligonucleotide DNA probes containing a putative GU-consensus sequence (wt) or a mutated version with four nucleotide 
substitutions (mt), together with or without a 50-fold molar excess of cold competitor containing the GLI site (competitor) or its mutant (noncompctitor). DNA- 
protein complexes were size fractionated in a nondenaturing poryacrylamide gel and were detected by autoradiography. The positions of the free probe and the 
shifted complexes are indicated by the open and closed arrows, respectively. (E) Identification of GLI-binding region in vivo. ChIP assay was performed with 
genomic fragments including the putative GLI-bindmg consensus sequence indicated at the top. Chromatin from 293T cells transfected with pCI-Flag (lanes 2, 
6, 10) or pFlag-GLJl (lanes 3, 4, 7, 8, 1 1, 12) was immunoprecipitated with anti-Flag antibody (lanes 2, 3, 6, 7, 10, 11). PCR amplification was performed with 
corresponding templates. Input represents a portion of the sonicated chromatin before immunoprecipitation. Anti-Myc antibody was used as a negative control 
(lanes 4, 8, 12). 



12). Taken together, our data show tfiat at least GLI-BS1 
and GL1-BS3 are involved in GLI-mediated PTCH expres- 
sion. In contrast, GLI-BS2 is not accessible to GLI in vivo, 
probably due to a higher genomic structure, although the 
accessibility may be cell-type dependent 

Functional analysis of three isoforms of PTCH 

In 293T cells, overexpression of PTCH protein causes 
apoptosis and inhibition of cell proliferation [24,25]. Thus, 
it is expected that there is a basal level of leakage activity of 
Smo that excess PTCH prevents in the apparent absence of 
Shh. The fact that cyclopamine has a proapoptotic effect in 
these cells supports this possibility (discussed below). On 
the basis of this background, we performed a functional 
analysis of the PTCH isoforms using a GLI-responsive 
luciferase reporter in 293T cells. Lucifcrase activities were 



suppressed when 293T cells were transfected with plasmids 
for PTCH L and PTCH M but not with an empty vector, 
pcDNA3.0 (Fig. 5 A). This suppression was not observed 
when cells were transfected with the plasmid for PTCHAC 
which encodes only 194 N-terminal amino acid residues, 
indicating the specificity of the results. To investigate the 
function of PTCH in vivo, ETCH was transiently expressed 
in 293T cells. As expected, PTCH L and PTCH M induced 
apoptosis in 293T cells as measured fay assessing the sub- 
GQ/G1 population (Fig. 5B). However, they were not as 
potent as cyclopamine, a well-known inhibitor of Shh 
signaling [26]. This is probably, at least in part, due to the 
presence of untransfected cells. Interestingly, in contrast to 
PTCH L and PTCH M , PTCH S did not significantly suppress 
GLI-responsive luciferase activity or induce apoptosis, im- 
plying that this isoform does not have the expected function 
of a PTCH protein or the expression level of this isoform 
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Fig. 5. Functional analysis of PTCH isoforms. (A) Inhibition of GLI-rcsponsive lucifcrasc activity by PTCH. 293T cells were transfected with various 
expression plasmids as indicated together with 8 x GLI-Luc containing eight GLI-binding sites and LTR-LacZ. After the transfection, cells were cultured in 
0.5% FCS for 1 6 h and then harvested for the lucifcrasc assay. Firefly hiciferase activity was normalized to fV-ga!actosidase activity from a cotransfected LTR- 
lacZ vector. Data are representative of three experiments with similar results. (B) PTCH-induced cell death as measured based on DNA content 293T cells were 
transfected with plasmids for PTCH or treated with cyclopamine or vehicle alone (cthanol). The induction of apoptosis was assessed by the increase in the 
subGO/Gl population compared with mock-transfected cells. (C) Protein levels of expressed genes. Cell rysates were obtained from 293T cells transfected with 
indicated plasmids and subjected to immunoblotting with an anti-c-Myc antibody. Tubulin is a loading control. The molecular weights of the four PTCH 
protein products predicted from the composition of amino acid residues, including the Myc tag, are as follows: PTCH L , 1 72 kDa; PTCH M » 163 kDa; PTCH S , 
154 kDa; PTCHAC, 32.2 kDa. (D) RT-PCR analysis of expressed genes. Total RNA was extracted from 293T cells transfected with plasmids for each isoform 
of PTCH and RT-PCR analysis was performed using primers depicted at the top. A forward primer was constructed in the linker region between the Myc tag 
and PTCH and a reverse primer in exon 2. Filled boxes indicate the position of the first transmembrane domain. GAPDH is an internal control for RT-PCR. (E) 
Metabolic labeling of the PTCH proteins. 293T cells transfected with a construct for PTCH were pulse-labeled with [ 33 S]methionme and chased for the 
indicated periods. 35 S-labelcd PTCH was immunoprecipitated, detected by autoradiography (top), and then quantified by phosphorimaging. Levels of labeled 
PTCH are plotted relative to the amount present at time 0 (bottom). 



is too low to cause these changes. To examine these 
possibilities, we first investigated the protein levels of each 
PTCH isoform by immunoblotting. Compared with PTCH L , 
PTCH M , and PTCHAC, the protein level of PTCH S was 
markedly reduced (Fig. 5C). The diffuse migration of PTCH 



proteins is thought to be due to glycosylation as reported 
[27,28], However, when RT-PCR was performed to analyze 
mRNA levels, these three isoforms were found to be 
expressed at comparable levels (Fig. 5D). These results 
indicate that the stability of PTCH S protein is compromised. 
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To determine whether the reduced activity of PTCH S was 
due to decreased protein stability, we measured die half-life 
of the three isoforms. 293T cells transfected with a plasmid 
for each isoform were metabolically labeled with [ 3 *S]- 
methionine and then incubated with excess unlabeled amino 
acids for various lengths of time. PTCH proteins were 
immunoprecipitated and size-separated by SDS-PAGE. As 
shown in Fig. 5E, Myc-tagged PTCH proteins were 
visualized at a point corresponding to approximately 
the same size as that detected by immunoblotting. Follow- 
ing a 180-min chase, 36 and 23% of de novo synthesized 
PTCH L and PTCH M , respectively, remained in 293T cells. 
Half-lives were calculated as 115 and 83 min, respectively. 
In contrast, the degradation of PTCH S was considerably 
accelerated, such that 5% of the protein remained at 1 80 min 
(half-life 26 min). These results indicated that PTCH S 
is an unstable protein compared with PTCH L and PTCHm. 



Discussion 

Alternative pre-mRNA splicing is an important mecha- 
nism for generating protein diversity and may explain in 
part how mammalian complexity arises from a surprisingly 
small complement of genes. It also plays important roles in 
development and disease. A recent study estimated that 
\ greater than 55% of human genes are alternatively spliced 
\ [29] and that about 10% of the mutations in the human 
\ genome affect the canonical splice site sequence [30]. In 
-particular, isoforms of genes with alternative first exons may 
have distinct mechanisms of expression. For example, die 
DSCRJ (Down syndrome candidate region \)IMCIP1 
(modulatory calcineurin-interacting protein 1) and nNOS 
(neuronal nitric oxide synthase) genes have four and eight 
alternative first exons, respectively, and are subjected to a 
distinct expressional regulation by separate promoters 
[31,32]. 

In this study, we identified and characterized five 
alternative first exons in both human and mouse PTCH 
genes encoding four protein species. Thus, arguably, PTCH 
is one of die most complex human genes in terms of 
diversity at the 5' end. The transcription of all major 
isoforms was upregulated by GUI, an upstream tran- 
scription factor in the Shh pathway, although the degree 
of activation was cell type-specific. Unlike Drosophila ptc 
in which only a single transcript has been reported and 
whose promoter has a cluster of three GLI-binding 
consensus sequences in a 130-bp region [22], human and 
mouse PTCH have three consensus sequences dispersed 
over 7.5 kb between exon la and exon lb (Fig. 4A). Since 
exons lb, lc, Id, and le are located close to each other, it is 
likely that PTCH isoforms except PTCHa are regulated by 
at least partially overlapping promoters, including GLI-BS1 
in Fig. 4A. In contrast, exon la is located -8 kb upstream of 
exon lb and one of the GLI-binding sites is located inside 
exon la and the other two are located far downstream. No 



GLI-binding consensus sequence was found in the promoter 
region of PTCHa (i.e., upstream of exon la), at least not 
up to the 40 kb position. Thus, taking our results with 
the ChIP assay into consideration, it is likely that the 
two GLI-binding sequences, one in exon la and die other 
far downstream of exon la (GLI-BS3 and GLI-BS1 in 
Fig. 4A, respectively), arc responsible for die GU-mediated 
regulation of PTCHa. This is not unexpected because 
hepatocyte nuclear factor-3ft 9 another target gene of Shh 
signaling, has a GLI-binding site 3' of die transcription 
unit and this site is essential for the response to Shh 
[33]. Although NBCCS families who show linkage to 
chromosomal regions other than 9q223-q31, to where 
PTCH has been mapped, have not been reported, a 
considerable number of NBCCS patients do not have 
mutations within die coding region of PTCH [34-36]. 
Therefore, taking our results into account, it is warranted to 
examine mutations in GLI-binding sequences using samples 
from such patients. Interestingly, PTCH2, another homo- 
logue of the Drosophila Hh gene, whose mutations are 
found in BCC and medulloblastoma [37], also has a GLI- 
binding consensus sequence -470 bp upstream of die first 
methionine codon (based on the. genomic sequence, 
AL136380), indicating that PTCH2 is another target gene 
of die Shh pathway. Supporting this notion, PTCH2 is 
upregulated in basal- cell carcinoma in which Shh signaling 
is activated [38]. 

• PTCH L and PTCH M were equally potent in terms of 
suppressing GU-mediated transcription or inducing apop- 
tosis. In contrast, the PTCH S protein was less potent due to 
its instability. Amino acid residues 101-119 of PTCH L and 
35-53 of PTCH M comprise the first transmembrane domain, 
which is absent in PTCH S because it starts with Met 152 in 
PTCH L (Fig. ID). This probably explains why PTCH S is 
unstable. However, PTCH S was more ubiquitously 
expressed throughout adult tissues than the other two, 
implying that, despite its instability, PTCH S may be 
important for tissue homeostasis or tumor suppression. It 
is possible that a certain extracellular stress or stimulus such 
as the binding of Shh may stabilize PTCH S . In contrast, the 
expression of PTCH^ was always predominant during 
embryonic development, indicating that PTCH L plays a 
key role in embryogenesis. 

The generation of mice in which one of the isoforms is 
nonfunctional may help clarify the roles of the alternative 
proteins in normal development and carcinogenesis. In this 
study, we focused on the usage of alternative first exons. 
However, some cell-surface receptors, such as CD44, 
undergo a complex, combinatorial splicing that determines 
the function of the gene products [39]. Although the major 
transcripts of human and mouse PTCH are -8 kb long 
[18,20], we have identified rare transcripts lacking exons 4 
and 5 (K.N. and T.M., unpublished data). Therefore, a com- 
prehensive study of alternative pre-mRNA splicing through- 
out the gene using cost-effective and high-throughput 
methods, such as polymerase colony technology [40] or 
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exon junction microarrays [41], may shed light on the 
functional complexity of the PTCH gene and carcinogenesis 
with increased Shh pathway activity. 



Materials and methods 

Isolation of human PTCH isoforms and construction of 
plasmids 

To obtain 5' ends of cDNA, RNA ligase-mediated 
5HACE was performed using the GeneRacer kit (Invitrogen) 
according to the manufacturer's directions. Random primers 
were used to reverse transcribe RNA. A reverse gene-specific 
primer was constructed in exon 2 to amplify die first-strand 
cDNA. The amplified cDNA was subcloned into pCR4- 
TOPO (Invitrogen) and sequenced. The expression plasmid 
encoding Myc-tagged PTCH L (pMyc-Ptcl) was kindly 
provided by Dr. Jeffrey Ming. To make expression plasmids 
for PTCH M and PTCH S . a DNA fragment encoding die N- 
terminal region oiPTCH L was excised from pMyc-Ptcl by 
digestion with EcoRl and replaced with the RT-PCR product 
encoding the N-tenninal region of PTCH M or PTCH S , 
respectively. To make luciferase constructs, pGV-PTCHl, 
pGV-PTCH2, and pGV-PTCH3, fragments for the human 
PTCH promoter ranging from bp -1354 to -746, -4105 to 
-3808, and -8427 to -8032, respectively, relative to the 
reported transcription start site (GenBank Accession No. 
NM_000264) were subcloned into pGV-P2 (Wako Chem- 
icals, Osaka, Japan). Mutated plasmids for these constructs 
were created by PCR-mediated mutagenesis as described 
previously [42]. The authenticity of all constructs was 
confirmed by DNA sequencing. The expression vector for 
FLAG-GUI, pSRa-Flag-GUl [43], and the reporter vector, 
8 x GU-Luc [33], were kindly provided by Dr. Alexander 
L. Joyner and Hiroshi Sasaki, respectively. 

Cell culture and transfections 

The human embryonic kidney cell line 293T and mouse 
embryonal carcinoma cell line PI 9 were maintained in 
DMEM supplemented with 10% fetal calf serum (FCS), 
50 U/ml penicillin, and 0.1 mg/ml streptomycin at 37°C in a 
humidified atmosphere of 5% C0 2 . The human neuro- 
blastoma line SH-SY5Y, oral squamous cell carcinoma lines 
HSC-2 and Ho-l-u-1, and lung squamous cell carcinoma line 
LK-2 (obtained from Cell Resource Center for Biomedical 
Research, Tohoku University, Japan) were maintained 
similarly except that RPMI 1640 medium was used. Cells 
were transfected with the indicated plasmids using Effectene 
reagent (Qiagen) and harvested at 16 h after the lipofection. 

Analysis of PTCH isoform expression profiles 

Human and mouse PTCH cDNA was amplified by RT- 
PCR using 0.5 jig of total RNA purified from a panel 



of human tissues (Ambion and Clontech) or mouse 
embryos and primers S'-CTGGGAGAAGACGGAGGA- 
GC-3' (exon la forward, human), 5'-CCCGGGAAATTA- 
ATAAAAGG-3' (exon la forward, mouse), 5'-GGACCG- 
GGACTATCTGCACC-3' (exon lb forward, human), 
5'-GGACCGGGACTATCTGCACC-3' (exon lb forward, 
mouse), 5'-CCTCTCCAGGAAAAGCAGCA-3' (exon 1c 
forward, human), 5'-GAGAAAGCAGCAGACAAGT- 
GAAGGTTG-3' (exon lc forward, mouse), 5'-ATCC- 
ATGTGGCTGCCCTCTT-3' (exon Id forward, human), 
5'-ATCCTTGTGGCCGCCCTCTT-3 / (exon Id forward, 
mouse), 5'-TTCTCGGCGGG-GGTCCAGTT-3' (exon 1e 
forward, human), 5'-CCAGA-TGGACCACGGTTGCTG- 
TAGATT-3' (exon le forward, mouse), 5'-CACAGCTC- 
CTCCACGTTGGT-3' (exon 2 reverse, human), and 5'-CA- 
CAGCTCCTCCACGTTGGT-3' (exon 2 reverse, mouse). 
During the log phase of amplification (25-35 cycles 
depending on the templates), 1 pJ of the PCR product was 
applied onto a DNA LabChip (Agilent Technologies) and 
loaded into an Agilent 2100 bioanaryzer according to the 
manufacturer's protocol. Data analysis was performed with 
Agilent 2100 bioanalyzer software. The expression of 
PTCH was normalized to that of die glyceraldehyde-3- 
phosphate dehydrpgenas (GAPDH) gene or p-actin gene. 

Western blotting 

Immunoblot analysis was performed as described pre- 
viously [44]. In brief, 30 ^g of the cell lysate was subjected 
to SDS-PAGE and transferred onto a nitrocellulose mem- 
brane. The membrane was incubated with anti-c-Myc (Santa 
Cruz, 9E10) or anti-Flag (Sigma, M2) mouse monoclonal 
antibody followed by horseradish peroxidase-conjugated 
anti-mouse immunoglobulins (DAKO). The proteins were 
visualized using enhanced chemiluminescence immunoblot- 
ting detection reagents (Amersham). 

Luciferase assay 

293T or SH-S Y5 Y cells growing on six-well culture plates 
were cotransfected using Effectene reagent with various 
combinations of plasmids as indicated in the figure legends. 
Transfected cells were maintained in 0.5% FCS for 16 h and 
then harvested for the luciferase assay using die reagents and 
protocols provided by Promega or Wako chemicals. 

Electrophoretic mobility shift assay 

To obtain GST-GLD fusion protein, Escherichia coli 
strain BL2I(DE3)pLysS (Novagen) was transformed with 
pGST-GLI3MF [45] (a gift from Dr. Shunsuke Ishii), which 
encodes the metal finger region of GLI3. The fusion protein 
was purified by affinity chromatography using glutathione- 
Sepharose 4B (Amersham Pharmacia Biotech) according 
to the manufacturer's instructions. The 32 P-labeled double- 
stranded oligonucleotide probes containing die sequence 
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for a consensus GLI-binding site (5'-TTGCCTACCTG- 
GGTGGTCTCTCTACTT-3', 5'-CTCAGCCCTGA- 
CCACCCAAGTCGAGCA-3', and 5'-GGCGCGGCAG- 
ACCACCCACGCCGAGGG-3') or a mutated sequence 
(5'-TTGCCTACCL4G7GGy4TCTCTCTACTT-3 , > 5'-CT- 
CAGCCCTGArcC4,CrAAGTCGAGCA-3' f and 5'-GG- 
CGCGGCAGArCC4CrACGCCGAGGG-30 (mutated 
nucleotides in italic) were incubated with the GST or 
GST-GU3 fusion (200 ng). The reaction was performed in 
10 jil of binding buffer containing 4% glycerol, 1 mM 
MgCl* 0.5 mM EDTA, 50 njM NaCl, 1 0 mM Tris-HCl (pH 
7.5), and 0.05 mg/rnl poly(dI-dC) for 20 min at room 
temperature. For competition experiments, a 50-fold molar 
excess of unlabeled, double-stranded oligonucleotide, con- 
taining either a GLI site or a mutated Gli site as described 
above, was included in binding reactions. Samples were 
fractionated on a nondenaturing 6% polyacrylamide gel and 
visualized by autoradiography. 

Apoptosis assay 

293T cells were plated at 4 x 10 5 per well onto a six-well 
plate, grown for 16 h, transfected with plasmids for Myc- 
tagged PTCH or treated with cyclopamine (Toronto 
Research Chemicals, Inc.) (5 pM final concentration), and 
then grown in DMEM with 0.5% FCS. After 24 h, relative 
DNA content was determined by flow cytometry as 
described previously [46]. Cells having a reduced DNA 
content (sub-G0/Gl) were regarded as apoptotic. 

Pulse-chase experiment 

293 cells were plated at 8 x 10 5 cells per 60-mm plate, 
cultured for 24 h, and transfected with 1 \xg of PTCH 
expression plasmid using the Lipofectamine Plus reagent kit 
(Invitrogen) according to die manufacturer's instructions. 
Twenty-four hours after the transfection, cells were incu- 
bated with DMEM lacking methionine (-Met) for 30 min 
and then with 16.7 fiCi/ml of L-[ 35 S]methionine with 
DMEM - Met for 2 h. Cells were washed three times with 
phosphate-buffered saline (PBS) and incubated with DMEM 
supplemented with 2 mM methionine for varying periods. 
At each time point, cells were scraped, washed with PBS, 
and lysed in 300 of lysis buffer containing 1 50 mM NaCl, 
1% Triton X-100, 10 mM Tris-HCl (pH 7.4), 5 mM EDTA, 
1 mM PMSF, 1 8 ^g/ml aprotinin, 50 ng/ml leupeptin, 1 mM 
benzamidine, and 0.7 ng/ml pepstatin. The extracts were 
pelleted at 16,000^ for 15 min at 4°C, and the supernatants 
(200 fil) were immunoprecipitated for 16 h with 20 \d of 
protein-A/G agarose (Santa Cruz) and 3 n! of anti-c-Myc 
antibody. The immunoprecipitates were washed three times 
with 1 ml of lysis buffer, solubilized in 20 \il of 1 x 
Laemmli buffer by heating at 95°C for 5 min, and resolved 
on a 5-20% gradient polyacrylamide gel. Gels were dried, 
exposed, and analyzed using a FUJIX BAS2000 imaging 
analyzer (Fuji Film). 



ChIP assay 

ChIP assay was performed using the acetyl-histone H3 
ChIP Assay Kit (Upstate Biotechnology), as recommended 
by the manufacturer, except that monoclonal anti-Flag (M2) 
and anti-Myc antibodies (9E10) were used in this study. 
293T cells were plated at 1 x 10 6 cells per 10-cm dish, 
grown for 16 h, and then transfected with pFlag-Glil or 
pCI-Flag (encoding Flag-tag epitope). After 24 h, genomic 
DNA and protein were cross-linked by addition of form- 
aldehyde (1% final concentration) directly to die culture 
medium and incubated for 10 mm at 37°C. Cells were lysed 
in 1 ml of SDS lysis buffer containing 1% SDS, 10 mM 
EDTA, and 50 mM Tris-HCl (pH 8.1) and sonicated to 
generate 300- to 1000-bp DNA fragments. After centrifu- 
gation, the cleared supernatant was diluted 10-fold with 
ChIP dilution buffer and incubated with the specific 
antibody at 4°C for 16 h with rotation before incubation 
with protein A-Sepharose beads at4°C for 1 h with rotation. 
Immune complexes were precipitated, washed, and eluted as 
recommended. DNA-protein cross-links were reversed by 
heating to 65°C for 2.5 h. DNA was phenol extracted, 
ethanol precipitated, and resuspended in 20 \d of Tris- 
EDTA* We used 2.5 |il of each sample as a template for 
PCR. PCR amplification was performed using primers that 
flank the putative GLI-response elements, 5'-AAAGGC- 
TGGAGCTCCCGCCC-3' (GU-BS1, forward) and 5'-T- 
GCGCGCAAAGGCATCCCAC-3' (GU-BS1, reverse), or 
5'-GGGCATGCATATTAAAGCCG-3' (GU-BS2, forward) 
and 5-CGAGCGCTATCTTAATCTCC-3' (GLI-BS2, 
reverse), or 5 , -AGCGCCTGTTTACCCAGGAG-3' (GLI- 
BS3, forward) and 5 / -GCTCCTCCGTCTTCTCCCAG-3 / 
(GLI-BS3, reverse). 
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ABSTRACT 

The identification of circulating tumor antigens or their related autoan- 
tibody) provides a means for early cancer diagnosis as well as leads lor 
therapy. Wc have used a proteomic appniach to identify proteins that 
commonly Induce a humoral response in pancreatic cancer. Aliquot* of 
aolublllzed proteins from a pancreatic eaucer ceil tine (Pano-1) were 
auhjected to two-dhnennonal PAGE, followed by Western Wot analysis In 
which sera of individual patients were tested for primary antibodies. Sera 
from 36 newly diagnosed patients with pancreatic cancer, 18 patient* with 
chronic pancreatitis, 33 patients with other cancers, and IS healthy sub- 
jeets were analyzed. Autoantibodies were detected against either one or 
two calreticulin Isoforms Identified by mass spectrometry in sera from 21 
of 36 patients with pancreatic cancer. One of 18 chronic pancreatttif 
patients and 1 of 15 healthy controls demonstrated autoantibodies to 
calretkvHn buform 1; none demonstrated autoantibodies to faoforan 2. 
None of the sera from patients with colon cancer exhibited reactivity 
against either of these two proteins. One of 14 sera from lung adenocar* 
dnoma patients demonstrated autoantibodies to calreticulin isoform 1; 2 
of 14 demonstrated autoantibodies to Isoform 2. ImmunnhtstochemlcaJ 
analysis of calreticulin in pancreatic/aniputlBry tumor tissue arrays using 
an isoform nonspecific antibody revealed diffuse and cimslKtent cytoplas- 
mic staining in the neoplastic epithelial cells of the pancreatic and aav 
puuary adenocarcinomas. The detection of autoantibodies to calreticnun 
Isoforms may have utility for the early diagnosis of pancreatic cancer. 

INTRODUCTION 

There la, at present, much interest in identifying markers for the 
early detection of pancreatic cancer. We have implemented a 
proteomics-ba*ed approach to identify tumor markers based on their 
occurrence as tumor antigens that elicit a humoral response during 
turnorigeneds. The humoral immune response to cancer in humans 
has been well demonstrated by identification of autoantibodies to a 
number of different intracellular and surface antigens in patients with 
various liinwr types (1-4). 

Pancreatic cancer has the worst prognosis of all cancers, with a 
5-year survival rate of <3%, accounting for the fourth largest number 
of cancer deaths in the United States (5). It occurs with a frequency of 
around 9 patients/ 100,000 individuals, making it the 1 1th most com- 
mon cancer in the United States. The poor prognosis for pancreatic 
cancer Is due, in part, to lack of early diagnosis. There is currently no 
effective btornarker-hased strategy useful for the early detection of 
pancreatic cancer or even to differentiate between pancreatic adeno- 
carcinoma and chronic pancreatitis. In pancreatic cancer, autoimmu- 
nity has been shown against several proteins, including MUC1 (6), 
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p53 (7), ond Rari^l (8) proteins. MUG I is a transniemhrHne glyco- 
protein involved in cell-cell and celJ-eAtracellular matrix interactions, 
and MUCI autoaniibodies have been observed in sera from patients 
with a variety of different tumors (9). In pancreatic cancer, the 
presence of MUCI IgO autoantibodies has been shown to be associ- 
ated with a favorable prognosis (6). The presence of p53 autoantibod- 
ies has been observed in 18.2% of patients with pancreatic cancer. 
However, p53 autoantibodies were also found in 53% of patients with 
acute pancreatitis ond 12. 1 % of patients with chronic pancreatitis, thus 
the humoral response to p53 was not specific to roalig&aney. The 
recombination factor RadSl is highly expressed in pancreatic adeno- 
carcinoma (10), and Rad5 1 autoantibodies have been observed \n 7% 
of patients with pancreatic cancer. 

L is not clear why only a subset of patients with a particular tumor 
type develop a humoral response to a particular antigen. Immunoge- 
nicity may depend on the level of expression, rx>sttratslatk>nal mod- 
ification, or other types of protein processing, the extent of which may 
be variable among tumors of a similar type. Other factors that may 
influence the immune response include variability among tumors and 
individuals in MHC molecules and in antigen presentation. A large 
number of autoantibodies have been identified in dtffereni tumor 
types, but in most cases, they occur hi less than 50% of sera of 
patients. Therefore, they arc not effective individually for the early 
detection of cancer. Thos, the development of panels of such autoan- 
tibodies directed against a variety of tumor antigens may be effective 
(II). 

The identificaiion of panels of tumor antigens that elicit an immune 
response may have utility in early cancer diagnosis, in establishing 
prognosis, and in immunotherapy against the disease. Several ap- 
proaches arc currently available for the identification of tumor anti- 
gens. In contrast to identification of tumor antigens based on analysts 
of recombinant proteins, the proreomic-based approach for the iden- 
tification of tumor antigens that we have used allows for the identi- 
fication of autoantibodies to proteins as they occurred in their onturul 
stales, in lysates prepared from tumors and tumor cell lines. This 
technology may uncover antigenicity associated with aberrant pott- 
translational modification of tumor ceil proteins. The goal of this 
study was to implement a proteomic approach for the Identification of 
tumor antigens that elicit a humoral response in pancreatic cancer cell 
line, Panc-J. To this end, we have used two-dimemional PAGE to 
simultaneously separate individual cellular proteins from the Panc-1 
cell line, The separated proteins were transferred onto polyvinylkiene 
di fluoride membranes. Sera from cancer patients were screened indi- 
vidually for antibodies that reacted against the separated proteins by 
Western blot analysis. Proteins specifically reacting with sera from 
cancer patients were identified by mass spectrometry. We have iden- 
tified two calreticulin isoforms as proteins that commonly elicit an 
antibody response in pancreatic cancer. 



MATERIALS AND METHODS 

Sera and Cell Lines. Serum and tumor tissue were obtained at die time of 
diagnosis following informed consent. The experimental protocol was ap- 
proved by The University of Michigan Institutional Review Board. Sera were 
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obtained from 36 patients with paucreatic cancer (all of advanced stage). Sera 
from 18 patieuis with chronic pancreatitis, from 15 healthy individuals, and 
from 33 patients with other cancers (14 with lung cancer and 19 with colon 
cancer) were used as controls. Alt subjects that donated sera for this study were 
between 57 and 74 years of age. The human cancer cell lines used hi this study 
were all individually cultured in DMEM supplemented with 10% fetal bovine 
serum, 100 units/ml penicillin, and 100 units/ml strcptomycrn (Invitrugen, 
Carbthad, CA). 

Two-Dimensional PAGE and Western Blot Analysis. After excision, the 
tumor tissue was immediutiHy frozen at -80°C after which an aliquot was 
lysed in wlubflizatioji buffer [8 M urea (Bio-Rad), 2% NF40, 2% carrier 
ampholytes (pH 4-8; Gallard/Schlessinger, Cark Race, NY), 2% ^raercap- 
locthanol and 10 un* pbenylniethylsulfouyl fluoride) and stored at --80°C 
until use. Cultured Paitc-J pancreatic adenocarcinoma cells were harvested in 
300 m! of solubilization buffer by using a cell scraper and stored at -BOX 
until use. Protein* derived from the extracts of cither' cultured ecus or solid 
tumors were separated into two dimension* as described previously (12). In 
brief, solubflized proteins were applied onto isoelectric focusing gels. Isoelec- 
tric focusing wax pcrfonned using pH 4-8 carrier ampholytes at 700 V for 
16* rt, followed by 1000 V for an additional 2 h. The first-dimension gel was 
loaded onto the seur md-d i mcrwi on gel, after equilibration in 125 mM Trix (pH 
6JQ, 10% glycerol, 2% SDS. 1% DTT. and brumphenol blue. Pur the second- 
dimension separation, a gradient of 1 1-14% acrylamide (Crescent Chemical. 
Huuppuuge, NY) was used Proteins were rraraferred to an ImmubHon-P 
polyvinylidene Uifluoride membrane (MiJIipore, Bedford, MA) or visualized 
by aflver staining of (he gels. 

Western Blotting. After transfer, membranes were incubated with a block- 
ing buffer consisting of 10 mM Tris-HCI (pH 7.5), 50 niM Nad, 1.8% nonfat 
dry milk, and 0.01% Twccn 20 for 2 h. The membranes were incubated for 1 h 
at room lempersTure with serum obtained from either patients or healthy 
individuals aa a source of primary antibody at a 1:100 dilution. After three 
washes with washing buffer (Ifts- buffered saline containing 00)1% Tween 
20), the membranes were incubated wjm horseradish pcawuiase-conju gated 
sheep anrUiuman (Amentham Bioscience*, Pigcmaway» NJ) IgG antibodies at a 
dilution of 1:1000 for I h at room temperature. Immunodetection was accom- 
plished by enhanced cheroilumlnescencc (Amersham Biosciences) followed by 
autoradiography on Hyperfilm MP (Amersham Biosciences). . 

Calreticulin "Mectkm by Western Blotting. A rabbit anuValiericufin 
polyclonal antibody (Affinity Biorcagenis. Golden, CO) wus used at a 1:1000 
dilution for Western blotting and was processed as for incubations with patient 
sera, with * horseradish pcroxioaje-conjugaled antirabbit IgG (Amersham 
Biosciences) as the secoridary antibody. 

In-Gd Enzyme Digestion and Mass Spectrometry. For protein identifi- 
cation by mass spectrometry, two-dimensional gels were stained by a modified 
silver-staining rnemod, and excised proteins were destatned for 5 min in 13 mM 
potassium fcrricyanide and 50 nui sodium rhiosulfate as described previously 
(13). After three washes with water, the gel pieces were dehydrated in 100% 
acetonitrile for 5 min and then dried Digestion was performed by the addition 
of 100 ng of trypsin (Promega, Madison, Wl) in 200 nu ammonium bicar- 
bonate. After enzymatic digestion overnight at 37X, Che peptides were ex- 
tracted twice with 50 ul of 60% .acctonitriJc/l% tti fluoroscopic acid After 
removal of acctonitriJc by ccntrirugarion in a vacuum centrifuge, the peptides 
were concentrated by using pipette dps C 18 (Mllliporc) and identified by 
nanoflow capillary liquid chromatography coupled with dectrospray quadru- 
pnle-thne nf flight tandem maw spectn>metry in the quadrupolc-tirne of flight 
micro (MicroMa**, Manchester, United Kingdom). 'Ihe acquired spectra were 
processed and searched against a non redundant SwissProt protein seque n ce 
database using proternLyra Global Server* 

RNA Isolation* Samples of normal pancreas were taken from organ donors 
provided by the Michigan Traiwplarnaiiim Society (five) or from areas outside, 
regions of pathology in surgically resected paiicieata (two). AH of the puncre- 
auc cancers were of advanced stage. All samples were processed in a similar 
manner. Prozen samples were embedded in OCT-freezing media (Miles Sci- 
entific, Napterville. IL) and cryolomc sectioned (5 tun\ and routine II&E 
slain* were evaluated by a surgical puiholngtM. Areas of relatively pure tumor 
(at least 70% tumor cells) or normal tissue were nucrodissected, and these 
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Lung cancer 


14 


1(7.1%) 


2 00%) 


2 (14J%) 


Colon cancer 


19 


0(0%) 


010%) 


0(0%) 


Ncolihy subjects 


15 
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areas were selected for RNA Isolation. All grade* of differentiation were 
exhibited by the tumor*. 

Isolates of human tumor tissue und human tumor cell lines were homoge- 
nized in the presence of TRIzol reagent (Invitrogcn), and total cellular RNA 
was purified according to tbe manufacturer's rjrocedunn. RNA samples weoe 
further purified using acid phenol extraction and RNeasy spin columns (Qia- 
gen, Valencia, CA). RNA quality was assessed by 1% agarose- gel electro- 
phoresis in the presence of erhidiuni bromide. 

Gene Expression ProflBng and Statistical Analysis. This study nsed 
commercially available high-density oligonucleotide microarrays (U133A; Af- 
fymctrix, Santa Oar*, CA). Hybridization, scanning, and image analysis of the 
arrays were performed according to manufacturer's protocols and as described 
previously (14, 15). The U133A array consists of 22,783 pmhe seta, each 
representing a transcript. Each probe set typically consists of 1 1 perfectly 
conipJeineulary 25-baae-Joug probes (PM) as well as 1 1 mismatch probes 
(MM) that are identical escept for an altered central bnso. A normal pancreas 
sample was selected as the standard, and probe pahs for which PM-MM £100 
on the standard were excluded from additional analysis. The average of the 
middle 50% of tbe PM-MM differences was used ax the expression measure 
for each probe set. A quamilc normalization procedure was used to adjust for 
differences in the probe intensity distribution across different chips. In brief, 
we applied a monotone linear spline to each chip thai mapped quaniiles 0l0l 
up to 0.99 (in increments of 0.01) exactly to Ihe corresponding quanriles of the 
standard For statistical analysis, we first naiisfunned each riormaHzed probe- 
set expression value, jr, to log (100 + max(x +100; 0)], which wo found 
stabilized the within group variances between high- and townsxpression probe 
sets. To compare normal and tumor samples, wc performed a one-way 
ANOVA. modeling me log-transformed values for each probe set as havrag 
separate means for each group. We calculated fold changes between groups of 
samples by first replacing mean expression values < 100 units by 1 00 to avoid 
negative value* ur sptiriirasfy large fold changes. Code to pe a fo rm these 
computations is freely available. 7 

Determination of Calreticulin mRNA Levels Using Real-Time PCS. 
Five pancreatic, four lung, three colon and two ovarian cancer cell lines were 
used lo compare the uiRNA expression level of calretkrulin. Expression levels 
were normal bed to glyccraldehyde-3-phns|^ate oehydrogcnaae (GAPDH) 
mRNA exrxessian. Oligonucleotide primers and TaqMan probes were de- 
signed using the Light Cycler Probe Design .Software (Roche Applied Sci- 
ence). Forward and reverse priraers for human calnMkulin were 5'-CGCC- 
ATOCrGCrATTJC-3' and 3 '-CAT A A A AGCGTGCATCCT-3', respectively 
(Applied Biosy stems). The nucleotide sequence of! the forward and reverse 
primers for GAPDH were 5 '-GA AGGTOAAGGTCGGAGTC-3' and 5'- 
G AAGATGCTTG ATGGG ATTTC-3 ' , respectively (Applied Biosystems). 

Tbeftrst^strand cDNA was symhescml with Superscript finl-Strand Synlhem 
System for icvcise tmrecriptkxi-PCR according to the numrfac^uccr*s uistroctiom 
(Invitmgen). Quantitative PCR reaction was carried out in 96-well optical reaction 
plates using cDNA derived from 50 ng of total RNA for each sample in a volume 
of 25 jd PCR was performed on the AB1 Prism 7700 Seo^rjcc Detector (Applkxl 
Biosysfems). The cycling conditions were 10 mJo at 93°C followed by 55 cycles 
at 9S°C ice 30 s, WC for 45 a, and TTCforiSa. 

To control for the variation in the amount of start ing RNA among samples, ; 
we performed amplification of GAPDH mRNA as an internal reference against . 
which other RNA values were normalized AddirkHially, the real -time PCR 
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Silver-stained Image 



Normal Serum 



Pancreatic Cancer Serum 



Hi, u A sttver-tfaintd image of the Pauc-1 pancnadc tumor cell line (A) compiled with a Wwiere bktfofihe Pane- 1 cell Hue wlih nnrmal senrni (ft) and centra 
with panciBAtk cmcer iO. 



from a patient 



products were punned by QIAQuick Gel Extraction kit (Qiagen) and subjected 
to DNA sequencing to verify the identity of the real-time PGR products. 

ftncmsMnipuIlary Tumor Tissue Array and Irnmuoohfetoclwmistry. 
A tissue array containing triplicates of 4 normal pancreas. 12 nonpancneas normal 
iBRuea, 47 pancreatic adenocaranuriuBi, 31 ampirffciTy adenocarcinomas, and 2 
large cell anaplastic carcinomas wa* comtnjcted as described previously (9). The 
cases woe randomly selected from toe University of Michigan Pathology ar- 
chives, lnunuriohistochcmistry for calreticulin was performed using the same 
rabbit polyclonal antibody (30 mm incobadon at room temperature) at 1 :200 using 
citrate buffer (pll 6.0) uud microwave antigen retrieval (10 min) and the Dako 
automated instrument (Dako Cytoraatfon, Carpinterift.CAX Primary antibody was 
detected using the Data Firvision kit 



RESULTS 

Pancreatic Tnroor Protcms Recognized Specifically by Sera 
from Nevdy Diagnosed Patients with Pancreatic Cancer. Panc-1 
pancreatic tumor cell line proteins were separated by two-dimensional 
PAGE and transferred onto tmrnobilon-P polyvinyl idene difluoride 
membranes. Sera obtained from 36 newly diagnosed patients with 
pancreatic cancer, from 18 patterns with chronic pancreatitis, from 33 
patients with other types of cancers, and from 15 healthy donors were 
screened individually tor the presence of antibodies to Pancvl pan- 
creatic tumor cell line proteins (Table 1). Each tncmbtanc was treated 




Anti-calreticuHn Antfbotiy 



Pancreatic Cancer Serum 

Hg. X Western btot aaalysii of olrctictriin wkh a polyclonal anticalrcticttHn antibody (A) aad sera from a jwmrattc cancer 
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m/at 
840*4000 
1196.0524 



Ml) Delta 
2518. 1860 0*01 
2390.0910 -0.02 



Score Start End 
12*74 186 207 
5.73 18« 206 



Sequence 

{K)»NSQ VBSOS LEDDK X>*LP» «<I) 

(K)IDNSQ VESGS LEDDW DFLW K(K) 



4* 




6 ^ V * M ^" w 1 ^ a* » K<o i*U wi i>* m*v i» W iwo 



1 MLLSVPIiLLO L1GLAVAEPA VYFKEQFLDG 

71 DARFYALSAS FBPFSNKGQT LWQFTVKHB 

141 TKKVHVIFNY KGKNVLIKKD IRCKDDEFTH 

211 PDASKPEDWD ERIKIDDPTD SKPEDWDKPE 

281 DNPDYKGTWI HPE1DNPBYS PDPSIYAYDM 

351 KAAEKQMKDfc QDEEQRLKEfe BEDKKRKEEB 



DGWTSRWIES KHKSDFGKFV LSSGKFYGDB EKDKGLQTSQ 
QKIDCGGGYV KLFPNSLDQT DMHGDSBYNI MFGPDICQPQ 
LYTL1VRPDH TYEVITjEjift^ 

HIPDPDAKKP EDWDEEMDGB WEPPVIQNPB YKGEWKPRCI 
FOVLGLDLWQ VKSGT1FDNP LITNDEAYAB EFGWETWGVT 
KAEDKFDDED KDEDEEDEBD KEBDEBEDVP GQAKDEI* 



Fig. 3. Tandem mass tpectrunvtry identflfeaiicn off calreticutin Uofonn J. The 
shown by tiuJyiU with efcttrospray quadnipote-tiroe of flight coupled wim nanoflow 
840.4000, and the mutant peuks woe soorcbed njoirart (be ncmre d uiida n t Svi»*Pr« 



noem mass spectituneuy spectrum of cabtticulio isofonn 1 (obtained after trypsin digestion) Is 
"capillary higtv^formoncc liquid cimmwiogr^hy. The precursor ion shown in the figure U m/z 
protean icqucncc database using tbc ProccinLynx elcbal setvtfc 



with one scrum sample as the primary antibody and with sheep 
antihuraan IgG as the secondary antibody. In general, most pancreatic 
patient sera reacted against multiple spots (Hg. 1; Rg. 2), Some of the 
reactive protein spots were observed in the control sera and thus were 
considered to represent nonspecific reactivity. The reactive proteins 
most commonly observed with pancreatic cancer patient sera but not 
with noncancer controls included two proteins (spots 1 and 2) with an 
estimated molecular mass of 55-60 kDa and an isoelectric point of 
4.4. These two proteins frequently showed concordant reactivity with 
the same sera suggesting, given their close proximity in two-dimen- 
sional gels, that they represented isofonns of the same protein. The 
protein from spot 1 showed reactivity with sera from 17 of 36 patient* 
with pamreatic cancer (47.2** with sera from I of 18 patients with 
chronic pancreatitis (5.6%) and with seta from 1 of 15 healthy donors 
(6.7%). The protein from spot 2 showed reactivity in 16 of 36 patients 
with pancreatic cancer (44.4%), in 0 of 18 patients with dironic 
pancreatitis and in 0 of 15 healthy donors (Table 1). The number of 
pancreatic cancer patients 1 sera that showed reactivity with one or 
both spots was 2! of 36 (583%). Reactivity directed against the 
protein in spot i was found in sera from 1 of 14 patients with lung" 
cancer, reactivity directed against the protein in spot 2 was found in 
2 of 14 lung cancer patients. None of the sera from 19 colon cancer 
patients exhibited reactivity against the protein in either spot (1 or 2). 

Identification of the Reactive Proteins as Isoforms of CahrerJ- 
culln. The proteins of interest were extracted from the gels after 
two-dimensional PAGE and silver staining. The proteins were di- 
gested with trypsin, and the resulting peptides were analyzed by 



electrospray quadntpole-time of flight tandem mass spectrometry. The 
acquired spectra were processed and searched against a nonredundant 
SwissProt protein sequence database using proteinLynx Global Serv- 
er. 6 The two proteins were identified (Pig. 3) as being isofonns of 
calrcticulin, (SwissProt accession no. P27797). Identity with calreti- 
culin was confirmed with two-dimensional Western blotting using 
Panc-1 whole-cell extracts and an antkrarreiiculin rabbit polyclonal 
antibody* 



ft AOJ LIS — I 




■ ICapaa-l 

* S.IWI 

Lmg uu i c a t f ■ Sacs 

0 6.A549 

■ 7.NC144520 

• im-msi 

' S.NOH175tt 



ISM 



a MlLrifa 

• u.xnt 

a I1HCTIH 
Ovarian csaexrcetl Bses 
« U.SK0V9 
■ H.0YCAJU 



119 4 4 IS II IS 

Fig. 4. Cn tretinoin mRNA level* measured by reaMwne PCR in pancreatic, long, 
colun, imd ovarian tumor eel) lines cxj*«*sed as cafrttiudintfAPDH ratio, m described 
in M Mmcrisb and Methods." Each bar rtpreaeatt tbe nea ± SB of five twfcperideal 
eJiperiments* 
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Chronic Normal Pancreatic 

Pancreatitis Pancreas Adenocarcinoma 

1^. 3 una, roiciuMmy <bn of calreticulin in chronic pancrcatMft (4), iwm»J funereal 
(7). and pmcre*tc adenocjrcinoiua (8> Each bar rcpnaaiU the mean ± SO 

Rub of Glycosylation In Calrctkulin Antigenicity. Wc sought to 
determine whether cabedculin glycosylation confribuiod to immu- 
nogenichy. Solubilizcd proteins from the Portc-1 pancreatic lunior 
cell line were subjected to tf-degiycosyiation by a combination of 
endoglyuondafte F, endo»a-//-acelylgalactosainiJiidase, and o>2- 
3,6,8,9-ncuraminidB.se. The resulting products were separated by SDS 
electrophoresis and analyzed by Western blotting. Although the deg- 
lycosylatcd positive control revealed a demonstrable mobility shift by 
SDS-PAGE, the deglycosylating enzyme treatment did not result in 
jury mobility shiftii of calrcdculin. Thus, codoglycosidase H-seusitive 
glycosylation docs not appear to play a role in the observed immu- 
nogenicity of tfie-calrctksufin isoforms (datii not shown). - - - - 

mKNA Expression of Calretfeulin. To examine wherher the inv 
munogenicity of calreticulin in pancreatic cancer could be due to 
elevated transcriptional mechanisms, the expression of calreticulin 



mRNA was examined in difTcrcnt cell lines and tumor tissues. To 
examine calreticulin expression in all cell lines, including five pan- 
creatic tumor cell lines, four lung tumor cell lines, three colon tumor 
cell lines, and two ovarian tumor cell line*, we performed real time- 
PCR using the expression level of GAPDH as an internal control. 
After normalization, the calreticulimOAPDH ratio was calculated 
from each cell line (Hg. 4). In general, we found that the level of 
mRNA expression in the pancreatic tumor ceil lines was significantly 
higher than the other cell lines examined, suggesting mat overexpres- 
sion of calreticulin may be a possible contributing factor in its in> 
munogenicity. Therefore, we examined calreticulin expression in 
eight pancreatic adenocarcinomas, in four samples of chronic pancre- 
atitis, and in seven samples of normal pancreas by microarray analysis 
(Fig. 5). The expression of calreticulin mRNA was approximately 
2-fold higher in pancreatic tumors as compared with normal pancreas 
</ J = 0.006). It is important' to note, however, that the pancreatic 
adenocarcinomas were mlcrodlmected and are derived frum ductal 
epithelium. Because the norma) pancreas is primarir/ acinar, it may be 
that the difference in gene expression noted in the pancreas tumors is 
entirely related to tl>e differences in the epithelium analyzed rather 
than any differences that arose in the tumors. 

Analysis of Calreticulin Expression by Two-Dimensional 
PAGE. We hypothesized that there might be changes in the levels of 
calreticulin total protein or isoforms that could lead to antigenicity in 
pancreatic cancer. Using two-dimensional. PACE, we examined the 
expression of culretieulin isoforms I and 2 In a variety of tissues and 
minor types. All calreticulin isoforms were present in different cell 
lines, including 6 pancreatic rumor cell lines, 4 lung tumor cell lines, 
9 colon tumor cell lines, and 33 ovarian tumor cell line*, at similar 
expression levels. A similar pattern of expression was also observed 
in 6 pancreatic tumors, 38 lung tumors, 7 colon tumors, and 25 
ovarian tumors (Fig. 6). All isoforms were also observed in a variety 
of normal tissues and in gastric, esophageal, and brain tumors (data 
not shown). These results suggest that ail calreticulin isoforms, in- 
cluding isoforms 1 and 2, were ubiquitously expressed and that the 




■ LungT 

■ Pancreas T 
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QLungCL 
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Rc 6, CalretaUo protein k'cb nwmrred in hom-n lemon (7) ami tumor cdl tines (CU Two^imcn-UciuJ gcb were prepared ^ w|uWfc^^eto Ihw • vmdy of buto» 
tun*** *nA rum* cell lino., as described in -Materials and Methods." Btcksround-corrcctcd imc*rwed imcosiry (volume) w« jjw*ui«I (Visage »oH^ Gc^tc 'SoiiuJnai, Ana 
A*£m feM halretkrLlin, calreticulin laoTorm I, and colretlculin isoforra 2 and wis noraulized K> the values obtained from the ave^offwo cmktendried *^»*Jf«*- Ban 
^^^^•^^ kx 38 lung turnup 6 p*ncreo.ic uimoc. 7 colon mmon, 25 ov*i« tumm*. 4 lung cell Hoes. 6 poocre*ic cdl ho«, 9 colun cdt b»*. 33 ovina. 



rcptcacnu Ibc average uueruilie* 
cell lines. 
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Rg. 7. ImrauochUtoobcnikal staining of atfietfculln **• 
lag a nhfrii polyclonal andbody. Two example* (A and 5) of 
pancreatic ductal ufenocaccioofM arc stwwn, denKKWtrul- ' 
tug pnxJomwuinUy cytoplasmic tramunoreactiviiy in cfao ln- 
vsMvc neoplastic calls. T«mor(0) attoeotttaiiistpaacflcacie 
duel thai ccmutins low-grade pancreatic iutraducul gcupia» 
t'u (PANlN-l), wtdch soowi lets Uucnac cairctfcuIUt iro- 
maoorcactivity. C normal pancreas showing a small henign 
duct wilh mmiroal inunanorreetlvlQr (mmoU omnrh a 
mal pancreatic lata with intense bnmanoreactivfey (torjftf 
arrow), and the rencrining tissue composed of exocrine 
pancKBtw with nwdcrote ' inniuim rc a ct i wiy . 



level of protein expression was unlikely to contribute to the antige- 
nicity of cafreticolin. 

Immunohistochenilcal Analysis of Calreticulin. Calreticulin ex- 
pression in pancreatic and ampuilary tumors was assessed by immu- 
nohislochenristry (Fig. 7, A and IQ» using a rabbit polyclonal antical- 
reliculin antibody and the pancreatic/ampullary tumor tissue array. 
Diffuse and consistent cytoplasmic inimunorcactivity for calreticulin 
was observed in the majority or the pancreatic and ainpullary adeno- 
carcinomas. There were no significant staining differences with regard 
to tumor differentiation. Normal pancreatic ductal epithelium exhib- 
ited minimal reactivity! whereas norma) pancreatic islets exhibited 
intense immunoreactivity, and normal exocrine pancreas exhibited 
moderate reactivity (Fig. 7Q. 

DISCUSSION 

We have implemented a proteornks-based approach to identify 
proteins that elicit a humoral response in pancreatic cancer patients. 
This approach allows screening by Western blot analysis of patient 
sera for antibodies that react against separated tumor cell proteins. 
This study whs focused on a search for autoantibodies to pancreatic 
tumor proteins present in the Pane- 1 cancer cell Kne. Wc have shown 



that a humoral response directed against calreticulin bofoxm 1 or 2, or 
both, occurred in 58.3% of pancreatic cancer patients. One of 18 
chronic pancreatitis patients (5.6%) and 1 of 15 healthy controls 
(6.6%) demonstrated autoantibodies to calreticulin isoform I; none 
demonstrated autoantibodies to isoform 2, None of the sera from 
patients with colon cancer exhibited reactivity against these proteins. 
One of 14 (7.1%) sera from lung adenocardnoma patients demon- 
strated autoantibodies to calreticulin isoform 1. Two of 14 (14.3%) 
demonstrated autoantibodies to isoform 2. 

Calreticulin is an abundant, rugh-capadty Ca 3+ -binding protein 
found in the endoplasmic reticulum (ER) lumen of most cells of 
human origin, it has been shown to play a role in the regulation of a 
variety of cellular functions within (he ER lumen (chaperone func- 
■ lions and Ca 2 ' storage and signaling) and cahrdtnilin-dependent 
modulation of cell adhesion and gene expression at extru-ER sites 
(16). In particular, calreticulin interacts with /V-linked oligosaccha- 
rides on nascent proteins in the ER lumen, with Ca** binding essen- 
tia] for this function. 

It has been demonstrated that calreticulin elicits a humoral response 
in a variety of autoimmune diseases (17). Peptides transported into the 
lumen of ibe ER associate with calreticulin, as well as with protein 
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disulIMe isomerase and a»96 (18). Moreover, calrchculi- prepanv 
S„ purified from tumors elicit specific Immunity to the tumor ased 
L * c sZSof calieticulinbutnot to an anti 6 enical]y dbtmc. tumor 
09) TOs rnirnunogeniclty has been auributed to the FPt«tes «sso- 
c ated vrtth the calreticulia molecule The mechanising wh*h*e 
r , rrlic . llin . OCD ude complex elicits immunity is unknown. A number 
ofSc^Sopes of calredcuUn nave been identified (20). The 
eniScSrhimoral response in patients with autoimmune 
S bin reported to be located in the N^-terminal pan of 
thT^olecute. Calreiiculir. is a component of the MHC class 1 peptide 
Simplex (21). and it has been demon, oated ^ «h^cuHn 
eSwrnor- and peptlderspecific immunity (19). Interesting. Unas- 
Sin .Sown that Tpardoilac form of calreticulin elicits a humoral 
r^oont^bepat^ (22). with the reactive epitope . 

,erniin7ponion>, whereas ttaeinoct nnottin JkJ not elicit rea«Wtay. In 
our study although the truncated form of calrcoculm, 0032. ja 
nmsL in the Panc-1 tumor cell line, it did not elich tarainwteactiv- 
to TOs suggests that a specific mechanism of calrtticulin processing 
Sv exist during carcinogenesis that may differ between tumor types. 

A orerequisite for an immune response against a cellular protein U 
its presentation as an antigen. It is not clear why only a subset of 
rolients with a specific tumor type develop a hrnnpral response^ a 
LirScalar .ndgenVlmmunogcnicily may depend on the level of ex- 
nrcssion, posttranslaiional modificalion, or other types of process** 
of a protein, the extent of which may be variable among tumors of a 
• ibr tvoc We have demonstrated that calreticulin is not overex- 
ZL* rnVancreatic tumor cell lines at cither the mRN A or protein 
level, compared with lung, colon, or ovanan tumor cell hr*s m our 
study Tims, the immunoteactivity of odreticuhn is unlikely to be 
related to the level of protein expression. Furthermore, we were 
—unable to deinonstrate.abcrr.tt Winked glycosytatloo of cahedcuhn 
totte pancreatic tumor ceU lines (data not shown). It U possible that 
dw antigemciiy to the calreticulin isofbnns may be arising from the 
aberrant expression of an unrelated protein m pancreatic cancer that 
oeneratei an epitope thai cross-reacts with calreticulin. 

Ahnoofih the calreticuKn autoantibodies were largely restricted to 
paUem3h pancreatic cancer among die subjeot group, we ^ve£ 
Led. additional studies arc needed to determine the specificity of the 
LteUculin antibodies to pancreatic cancer. For example, although 
increased levels of calreticuKn antibodies were found in pancreatic 
cancer compared with chronic pancreatitis and other control groups, 
ihe relationship between tumor burden, tumor staging, and antibody 
tevels needs additional clarification. Assessment of the utility of 
calreticulin autoantibodies as diagnostic markers in pancreatic cancer 
atoneeds to be addressed in additional studies. It is clear, however, 
that the proteomic approach that we have implemented, which allows 
for the screening of native proteins as they arc expressed in tumor 
cells, has the potential to identify novel proteins that may have clinical 
utility in cancer. 
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